US20210201935A1 - Systems and methods to determine whether to unmute microphone based on camera input - Google Patents
Systems and methods to determine whether to unmute microphone based on camera input Download PDFInfo
- Publication number
- US20210201935A1 US20210201935A1 US16/727,836 US201916727836A US2021201935A1 US 20210201935 A1 US20210201935 A1 US 20210201935A1 US 201916727836 A US201916727836 A US 201916727836A US 2021201935 A1 US2021201935 A1 US 2021201935A1
- Authority
- US
- United States
- Prior art keywords
- user
- microphone
- input
- speaking
- gui
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims description 18
- 238000004891 communication Methods 0.000 claims abstract description 16
- 230000015654 memory Effects 0.000 claims description 22
- 230000003139 buffering effect Effects 0.000 claims description 3
- 230000007423 decrease Effects 0.000 claims 1
- 229910003460 diamond Inorganic materials 0.000 description 17
- 239000010432 diamond Substances 0.000 description 17
- 238000013528 artificial neural network Methods 0.000 description 12
- 238000013473 artificial intelligence Methods 0.000 description 9
- 230000000007 visual effect Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 6
- 230000033001 locomotion Effects 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 5
- 230000004807 localization Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 230000001902 propagating effect Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 241000699670 Mus sp. Species 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 238000010922 spray-dried dispersion Methods 0.000 description 1
- 238000001931 thermography Methods 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
- H04N7/147—Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R29/00—Monitoring arrangements; Testing arrangements
- H04R29/004—Monitoring arrangements; Testing arrangements for microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2430/00—Signal processing covered by H04R, not provided for in its groups
- H04R2430/01—Aspects of volume control, not necessarily automatic, in sound systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R27/00—Public address systems
Definitions
- the present application relates to technically inventive, non-routine solutions that are necessarily rooted in computer technology and that produce concrete technical improvements.
- a first device includes at least one processor and storage accessible to the at least one processor.
- the storage includes instructions executable by the at least one processor to receive input from a camera in communication with the at least one processor and to determine, based on the input from the camera, whether a user is currently speaking.
- the instructions are also executable to present a notification regarding whether to unmute at least one microphone accessible to the at least one processor responsive to a determination that the user is currently speaking.
- the first device may include both the camera and the at least one microphone.
- the instructions may be executable to execute a computer vision algorithm to determine whether the user is currently speaking
- the first device may include a display accessible to the at least one processor, and in these examples the instructions may be executable to present the notification on the display Furthermore, in some implementations the instructions may be executable to present the notification on the display as part of a graphical user interface (GUI) responsive to the determination that the user is currently speaking, where the GUI may include a selector that is selectable to unmute the at least one microphone.
- GUI graphical user interface
- the first device may include at least one speaker accessible to the at least one processor, and the instructions may be executable to present the notification audibly using the at least one speaker.
- the instructions may be executable to, prior to presentation of the notification, determine whether the at least one microphone is currently muted.
- the instructions may be executable to present the notification responsive to both the determination that the user is currently speaking and a determination that the at least one microphone is currently muted.
- the instructions may be executable to facilitate a video conference with a second device different from the first device using a first video conferencing application, and thus the instructions may be executable to determine whether the at least one microphone is currently muted via the first video conferencing application.
- the instructions may be further executable to, based on a determination that the at least one microphone is not currently muted via the first video conferencing application, determine whether the at least one microphone is currently muted via an operating system executing at the first device and/or hardware accessible to the first device.
- the hardware may include a switch or button that is manipulable to mute and unmute the at least one microphone.
- the instructions may be executable to receive first user input to unmute the at least one microphone subsequent to presentation of the notification and, responsive to receipt of the first user input, unmute the at least one microphone and transmit data to a second device.
- the data may indicate second user input to the at least one microphone that may include audible input.
- a method in another aspect, includes receiving input from a camera and determining, based on the input from the camera, whether a user is currently speaking. The method also includes, responsive to determining that the user is currently speaking, issuing a command to present a notification regarding whether to unmute at least one microphone accessible to a first device.
- the method may be performed by a server in communication with the first device, and the command may be issued by transmitting the command to the first device. Also in some implementations, the method may be performed by an end-user device that establishes the first device, and the command may be issued by controlling an electronic display accessible to the end-user device to present the notification.
- the notification itself may be presented on a display accessible to the first device as part of a graphical user interface (GUI), where the GUI may include a selector that is selectable to unmute the at least one microphone.
- GUI graphical user interface
- the method may include determining whether the at least one microphone is currently muted prior to issuing the command, and then issuing the command responsive to both determining that the user is currently speaking and determining that the at least one microphone is currently muted.
- At least one computer readable storage medium that is not a transitory signal includes instructions executable by at least one processor to receive input from a camera in communication with the at least one processor and to determine, based on the input from the camera, that a user is speaking.
- the instructions are also executable to, based on the determination that the user is speaking, present a graphical user interface (GUI) on a display accessible to the at least one processor.
- GUI graphical user interface
- the GUI includes an indication that at least one microphone accessible to the at least one processor is in a mute mode.
- the GUI may also include a selector that is selectable to take the at least one microphone out of the mute mode.
- FIG. 1 is a block diagram of an example system consistent with present principles
- FIG. 2 is a block diagram of an example network of devices consistent with present principles
- FIGS. 3, 10, and 11 show example illustrations consistent with present principles
- FIGS. 4 and 5 show example graphical user interfaces (GUIs) that may be presented based on determining that a user is speaking consistent with present principles;
- FIGS. 6 and 9 show flow charts of example algorithms consistent with present principles
- FIG. 7 shows example artificial intelligence architecture that may be used consistent with present principles
- FIG. 8 shows an example GUI for configuring one or more settings of a device to undertake present principles
- FIGS. 12 and 13 show example notifications that may be presented based on determining that a user is speaking consistent with present principles.
- the present application discloses systems and methods to use computer vision and artificial intelligence (AI) during video conferencing to detect if a user in front of a device's camera appears to be speaking by detecting specific movements of the mouth.
- AI computer vision and artificial intelligence
- the system and application mute settings may be accessed by the device to verify whether the microphone is set to “off mute”. If any of the settings are set to microphone mute being on, the user may then be notified that his or her microphone or device is set to audio input mute. The user may then determine if he or she wants to go off mute at that point, and/or the device may automatically set itself to go off mute.
- a system may include server and client components, connected over a network such that data may be exchanged between the client and server components.
- the client components may include one or more computing devices including televisions (e.g., smart TVs, Internet-enabled TVs), computers such as desktops, laptops and tablet computers, so-called convertible devices (e.g., having a tablet configuration and laptop configuration), and other mobile devices including smart phones.
- These client devices may employ, as non-limiting examples, operating systems from Apple Inc. of Cupertino Calif., Google Inc. of Mountain View, Calif., or Microsoft Corp. of Redmond, Wash. A Unix® or similar such as Linux® operating system may be used.
- These operating systems can execute one or more browsers such as a browser made by Microsoft or Google or Mozilla or another browser program that can access web pages and applications hosted by Internet servers over a network such as the Internet, a local intranet, or a virtual private network.
- instructions refer to computer-implemented steps for processing information in the system. Instructions can be implemented in software, firmware or hardware, or combinations thereof and include any type of programmed step undertaken by components of the system; hence, illustrative components, blocks, modules, circuits, and steps are sometimes set forth in terms of their functionality.
- a processor may be any general purpose single- or multi-chip processor that can execute logic by means of various lines such as address lines, data lines, and control lines and registers and shift registers. Moreover, any logical blocks, modules, and circuits described herein can be implemented or performed with a general purpose processor, a digital signal processor (DSP), a field programmable gate array (FPGA) or other programmable logic device such as an application specific integrated circuit (ASIC), discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein.
- DSP digital signal processor
- FPGA field programmable gate array
- ASIC application specific integrated circuit
- a processor can also be implemented by a controller or state machine or a combination of computing devices.
- the methods herein may be implemented as software instructions executed by a processor, suitably configured application specific integrated circuits (ASIC) or field programmable gate array (FPGA) modules, or any other convenient manner as would be appreciated by those skilled in those art.
- the software instructions may also be embodied in a non-transitory device that is being vended and/or provided that is not a transitory, propagating signal and/or a signal per se (such as a hard disk drive, CD ROM or Flash drive).
- the software code instructions may also be downloaded over the Internet. Accordingly, it is to be understood that although a software application for undertaking present principles may be vended with a device such as the system 100 described below, such an application may also be downloaded from a server to a device over a network such as the Internet.
- Software modules and/or applications described by way of flow charts and/or user interfaces herein can include various sub-routines, procedures, etc. Without limiting the disclosure, logic stated to be executed by a particular module can be redistributed to other software modules and/or combined together in a single module and/or made available in a shareable library.
- Logic when implemented in software can be written in an appropriate language such as but not limited to C# or C++, and can be stored on or transmitted through a computer-readable storage medium (that is not a transitory, propagating signal per se) such as a random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), compact disk read-only memory (CD-ROM) or other optical disk storage such as digital versatile disc (DVD), magnetic disk storage or other magnetic storage devices including removable thumb drives, etc.
- RAM random access memory
- ROM read-only memory
- EEPROM electrically erasable programmable read-only memory
- CD-ROM compact disk read-only memory
- DVD digital versatile disc
- magnetic disk storage or other magnetic storage devices including removable thumb drives, etc.
- a processor can access information over its input lines from data storage, such as the computer readable storage medium, and/or the processor can access information wirelessly from an Internet server by activating a wireless transceiver to send and receive data.
- Data typically is converted from analog signals to digital by circuitry between the antenna and the registers of the processor when being received and from digital to analog when being transmitted.
- the processor then processes the data through its shift registers to output calculated data on output lines, for presentation of the calculated data on the device.
- a system having at least one of A, B, and C includes systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.
- circuitry includes all levels of available integration, e.g., from discrete logic circuits to the highest level of circuit integration such as VLSI, and includes programmable logic components programmed to perform the functions of an embodiment as well as general-purpose or special-purpose processors programmed with instructions to perform those functions.
- the system 100 may be a desktop computer system, such as one of the ThinkCentre® or ThinkPad® series of personal computers sold by Lenovo (US) Inc. of Morrisville, N.C., or a workstation computer, such as the ThinkStation®, which are sold by Lenovo (US) Inc. of Morrisville, N.C.; however, as apparent from the description herein, a client device, a server or other machine in accordance with present principles may include other features or only some of the features of the system 100 .
- the system 100 may be, e.g., a game console such as XBOX®, and/or the system 100 may include a mobile communication device such as a mobile telephone, notebook computer, and/or other portable computerized device.
- the system 100 may include a so-called chipset 110 .
- a chipset refers to a group of integrated circuits, or chips, that are designed to work together. Chipsets are usually marketed as a single product (e.g., consider chipsets marketed under the brands INTEL®, AMD®, etc.).
- the chipset 110 has a particular architecture, which may vary to some extent depending on brand or manufacturer.
- the architecture of the chipset 110 includes a core and memory control group 120 and an I/O controller hub 150 that exchange information (e.g., data, signals, commands, etc.) via, for example, a direct management interface or direct media interface (DMI) 142 or a link controller 144 .
- DMI direct management interface or direct media interface
- the DMI 142 is a chip-to-chip interface (sometimes referred to as being a link between a “northbridge” and a “southbridge”).
- the core and memory control group 120 include one or more processors 122 (e.g., single core or multi-core, etc.) and a memory controller hub 126 that exchange information via a front side bus (FSB) 124 .
- processors 122 e.g., single core or multi-core, etc.
- memory controller hub 126 that exchange information via a front side bus (FSB) 124 .
- FSA front side bus
- various components of the core and memory control group 120 may be integrated onto a single processor die, for example, to make a chip that supplants the “northbridge” style architecture.
- the memory controller hub 126 interfaces with memory 140 .
- the memory controller hub 126 may provide support for DDR SDRAM memory (e.g., DDR, DDR2, DDR3, etc.).
- DDR SDRAM memory e.g., DDR, DDR2, DDR3, etc.
- the memory 140 is a type of random-access memory (RAM). It is often referred to as “system memory.”
- the memory controller hub 126 can further include a low-voltage differential signaling interface (LVDS) 132 .
- the LVDS 132 may be a so-called LVDS Display Interface (LDI) for support of a display device 192 (e.g., a CRT, a flat panel, a projector, a touch-enabled light emitting diode display or other video display, etc.).
- a block 138 includes some examples of technologies that may be supported via the LVDS interface 132 (e.g., serial digital video, HDMI/DVI, display port).
- the memory controller hub 126 also includes one or more PCI-express interfaces (PCI-E) 134 , for example, for support of discrete graphics 136 .
- PCI-E PCI-express interfaces
- the memory controller hub 126 may include a 16-lane (x16) PCI-E port for an external PCI-E-based graphics card (including, e.g., one of more GPUs).
- An example system may include AGP or PCI-E for support of graphics.
- the I/O hub controller 150 can include a variety of interfaces.
- the example of FIG. 1 includes a SATA interface 151 , one or more PCI-E interfaces 152 (optionally one or more legacy PCI interfaces), one or more USB interfaces 153 , a LAN interface 154 (more generally a network interface for communication over at least one network such as the Internet, a WAN, a LAN, etc.
- the I/O hub controller 150 may include integrated gigabit Ethernet controller lines multiplexed with a PCI-E interface port. Other network features may operate independent of a PCI-E interface.
- the interfaces of the I/O hub controller 150 may provide for communication with various devices, networks, etc.
- the SATA interface 151 provides for reading, writing or reading and writing information on one or more drives 180 such as HDDs, SDDs or a combination thereof, but in any case the drives 180 are understood to be, e.g., tangible computer readable storage mediums that are not transitory, propagating signals.
- the I/O hub controller 150 may also include an advanced host controller interface (AHCI) to support one or more drives 180 .
- AHCI advanced host controller interface
- the PCI-E interface 152 allows for wireless connections 182 to devices, networks, etc.
- the USB interface 153 provides for input devices 184 such as keyboards (KB), mice and various other devices (e.g., cameras, phones, storage, media players, etc.).
- the LPC interface 170 provides for use of one or more ASICs 171 , a trusted platform module (TPM) 172 , a super I/O 173 , a firmware hub 174 , BIOS support 175 as well as various types of memory 176 such as ROM 177 , Flash 178 , and non-volatile RAM (NVRAM) 179 .
- TPM trusted platform module
- this module may be in the form of a chip that can be used to authenticate software and hardware devices.
- a TPM may be capable of performing platform authentication and may be used to verify that a system seeking access is the expected system.
- the system 100 upon power on, may be configured to execute boot code 190 for the BIOS 168 , as stored within the SPI Flash 166 , and thereafter processes data under the control of one or more operating systems and application software (e.g., stored in system memory 140 ).
- An operating system may be stored in any of a variety of locations and accessed, for example, according to instructions of the BIOS 168 .
- the system 100 may include at least one microphone or a microphone array 193 that may provide input from the microphone/array 193 to the processor 122 based on audio that is detected, such as via a user providing audible input to the microphone/array 193 consistent with present principles.
- the system 100 may also include at least one camera 191 that may gather one or more images and provide the images to the processor 122 .
- the camera 191 may be a thermal imaging camera, an infrared (IR) camera, a digital camera such as a webcam, a three-dimensional (3D) camera, and/or a camera otherwise integrated into the system 100 and controllable by the processor 122 to gather pictures/images and/or video.
- the system 100 may include a gyroscope that senses and/or measures the orientation of the system 100 and provides input related thereto to the processor 122 , as well as an accelerometer that senses acceleration and/or movement of the system 100 and provides input related thereto to the processor 122 .
- the system 100 may include a GPS transceiver that is configured to communicate with at least one satellite to receive/identify geographic position information and provide the geographic position information to the processor 122 .
- another suitable position receiver other than a GPS receiver may be used in accordance with present principles to determine the location of the system 100 .
- an example client device or other machine/computer may include fewer or more features than shown on the system 100 of FIG. 1 .
- the system 100 is configured to undertake present principles.
- example devices are shown communicating over a network 200 such as the Internet in accordance with present principles, e.g., for video conferencing as described herein. It is to be understood that each of the devices described in reference to FIG. 2 may include at least some of the features, components, and/or elements of the system 100 described above. Indeed, any of the devices disclosed herein may include at least some of the features, components, and/or elements of the system 100 described above.
- FIG. 2 shows a notebook computer and/or convertible computer 202 , a desktop computer 204 , a wearable device 206 such as a smart watch, a smart television (TV) 208 , a smart phone 210 , a tablet computer 212 , and a server 214 such as an Internet server that may provide cloud storage accessible to the devices 202 - 212 .
- the devices 202 - 214 are configured to communicate with each other over the network 200 to undertake present principles.
- FIG. 3 it shows an example illustration 300 consistent with present principles.
- the illustration 300 depicts a user 302 participating in a video conference that is facilitated through an end-user device 304 such as a laptop computer, desktop computer, tablet computer, smart phone, etc.
- the device 304 may facilitate the video conference by executing a video conferencing application locally at the device 304 , with other devices of other remote participants 306 , 308 also executing their own respective copies of the same video conferencing application or another video conferencing application that otherwise may interface with the application executing at the device 304 .
- the video conferencing application may be, for example Skype, Apple's Facetime, a Google Gchat video conference, etc.
- the device 304 may have a built-in microphone 310 for receiving audible input from the user 302 to then transmit that input to the other respective devices for the remote participants 306 , 308 .
- the device 304 may also communicate with additional hardware such as a wireless, stand-alone microphone 312 that the user might be using to provide audible input that the device 304 may then transmit to the other devices.
- the microphone 312 may include a hardware switch or depressable button 314 that may be manipulable between on and off positions to respectively mute and unmute the microphone 312 so that, when muted, the microphone 312 does not transmit audible input it detects to the device 304 while the microphone 312 still remains powered on, or does not receive the audible input altogether (e.g., is turned off). Then when unmuted, the microphone 312 may receive and transmit audible input to the device 304 via Bluetooth or another communication protocol, and the device 304 may then relay the audible input to the respective devices of the other participants 306 , 308 in an Internet data stream as part of the video conference.
- a hardware switch or depressable button 314 may be manipulable between on and off positions to respectively mute and unmute the microphone 312 so that, when muted, the microphone 312 does not transmit audible input it detects to the device 304 while the microphone 312 still remains powered on, or does not receive the audible input altogether (e.g.,
- a mute selector 313 presented on a touch-enabled display 315 of the device 304 may also be selected and deselected with touch input to respectively mute and unmute the microphone 310 via the video conferencing application itself.
- the software mute through the video conferencing application may involve the device 304 receiving audible input via one of the microphones 310 , 312 and possibly buffering/caching a threshold most-recent amount of the audible input in random-access memory (RAM) of the device 304 , but not actually transmitting any voice data corresponding to the audible input to the respective devices of the other participants 306 , 308 .
- the threshold most-recent amount may be, for example, a most-recent thirty seconds.
- a camera 316 is shown as being disposed on the device 304 .
- the camera 316 may gather images of the scene within its field of view, which in this case includes the face of the user 302 .
- the camera 316 may then relay those images to the respective devices of the other participants 306 , 308 in an Internet data stream as part of the video conference.
- a chat box is shown so that the user 302 and other participants 306 , 308 may engage in text/instant message exchange as part of the video conference.
- the device and/or a server in communication with the device may determine that the user 302 is currently speaking by using input from the camera 316 . Based on that determination, a command may be issued by the server and/or the local processor on the device 304 (e.g., a central processing unit (CPU)) to present one or more notifications indicating that whatever microphone(s) is being used for the video conference (the microphones 310 and/or 312 ) is currently muted/in mute mode.
- the local processor on the device 304 e.g., a central processing unit (CPU)
- an audible notification 320 may be presented via a speaker on the device that says, “Your microphone is muted!”
- a visual notification 322 may be presented on the display 315 as shown so that the notification 322 is presented over top of other visual portions of the video conference, though in some examples the notification 322 may be presented to take up the full display space rather than a portion thereof.
- the visual notification 322 is shown in more detail in FIG. 4 .
- the notification 322 may be presented as part of a graphical user interface (GUI) 400 presented on the touch-enabled display of the device 304 .
- GUI 400 may include text 402 indicating that the device has determined that the user is currently speaking but that the microphone being used for video conferencing is currently in a mute mode/muted.
- the text 402 may indicate the following: “Are you trying to speak to other conference participants? Your microphone is currently in mute mode.”
- the GUI 400 may include a selector 404 that may be selectable to command the device 304 to take the microphone off of mute mode and/or otherwise unmute the microphone at the application level, operating system level, etc.
- the GUI 400 may even include a selector 406 that may be selectable to provide input indicating that the user 302 is not trying to speak to conference participants, with the input then being used to train an artificial neural network using machine learning to make improved determinations of the user speaking to conference participants in the future.
- the device 304 may begin streaming or otherwise transmitting, to the devices of the other conference participants, the user's audible input from that point forward as the user provides it to the microphone after selecting the selector 404 .
- the device 304 may have still been caching or storing the audible input during that time.
- the audible input may have been locally cached in RAM of the device 304 , and/or it may even have been streamed to and cached at a remotely-located server that is facilitating communication among the participants' devices for the video conference. Note that in some implementations, only a threshold amount of most-recent input (e.g., the last thirty seconds) may be cached in RAM and/or at the server.
- the device 304 or server may transmit the cached audible input to the other conference participants' devices.
- the device 304 and/or server may help ensure that although the mute mode was enabled while the cached audible input was spoken, it may still be provided to the other participants at a later time than when spoken rather than simply being lost, which would otherwise result in the user 302 having to re-speak what was already spoken or simply moving on to other speech to the detriment of the other conference participants.
- the GUI 500 of FIG. 5 may be presented on the touch-enabled display of the device 304 .
- the GUI 500 might be presented on the touch-enabled display responsive to automatic microphone unmuting rather than selection of the selector 404 , as might have occurred in other example implementations based on the device 304 detecting the user as currently speaking.
- the GUI 500 may include a non-text icon 502 and text 504 indicating that the microphone has been unmuted (or otherwise taken out of mute mode).
- the GUI 500 may also include text 506 instructing the user to wait before speaking any additional input to the microphone so that previously cached audible input form the user can be transmitted to the other conference participants and heard by them via their own respective devices before the user provides additional audible input.
- the GUI 500 may even include a selector 508 to again mute the microphone or otherwise place it back in mute mode, e.g., after the user is done speaking what he or she had to say.
- FIG. 6 it shows example logic that may be executed by a device such as the device 304 and/or the system 100 consistent with present principles. However, also note that in some examples some or all of the logic steps of FIG. 6 may be performed by a remotely-located server in communication with the device, such as the same server that might be used to replay audio/video communications between participants of a video conference consistent with present principles.
- a remotely-located server in communication with the device, such as the same server that might be used to replay audio/video communications between participants of a video conference consistent with present principles.
- the device may facilitate a video conference with other devices, e.g., using a video conferencing application.
- the device may launch the video conferencing application and/or initiate the video conference itself so that respective audible input and camera video from respective participant devices may be transmitted to the other participants in real time.
- the logic may proceed to block 602 .
- the device may receive input from a camera in communication with the device, such as its built-in webcam.
- the logic may then proceed to decision diamond 604 where the device may determine whether a user (such as the user 302 ) is currently speaking as indicated in the input from the camera.
- the device may execute a computer vision algorithm, for example.
- the computer vision algorithm may include, for example, a lip reading or movement algorithm, a gesture recognition algorithm, a facial recognition algorithm, etc. Additionally, note that in some examples the computer vision algorithm may make use of one or more artificial neural networks of an artificial intelligence model that may be used to determine whether the user is currently speaking based on the input from the camera. Example architecture for such a model will be described below in reference to FIG. 7 .
- the logic may revert back to block 600 and proceed therefrom. However, if the device makes an affirmative determination at diamond 604 , the logic may instead proceed to block 606 (or in some examples, directly to decision diamond 608 ). At block 606 the device may begin buffering or caching spoken input to the device's microphone as described above. From there the logic may proceed to decision diamond 608 .
- the device may determine whether the microphone is currently muted via a mute mode controlled by the video conferencing application itself so that audio detected by the microphone is not provided to conference participants even if it is buffered/cached locally at the device (e.g., software mute rather than turning the microphone off).
- An affirmative determination at diamond 608 may cause the logic to proceed to block 612 , which will be described shortly.
- a negative determination at diamond 608 may instead cause the logic to proceed to decision diamond 610 .
- the device may determine whether the microphone is currently muted via an operating system executing at the device itself (e.g., the device 304 ) and/or currently muted via hardware accessible to the device (e.g., muted via the button or switch 314 , or the microphone being turned off/powered down altogether).
- the operating system may be, for example, the device's basic input/output system (BIOS) or a guest operating system such as Microsoft's Windows, Apple's Mac OS, Linux, etc.
- the determination at diamond 610 may include whether the microphone has been muted or a mute mode entered via a “global” microphone mute command from the user to the operating system itself (rather than to the video conferencing application specifically) so that the microphone is muted for all functions that might be executed by the operating system using the microphone independent of the video conference itself.
- a negative determination at diamond 610 may cause the logic to revert back to block 600 where the device may continue facilitating the video conference and transmit data indicating the audible input from the user to the microphone to other conference participants consistent with present principles owing to the microphone being determined to not be muted on any of the levels discussed above (e.g., application level, operating system level, or via hardware).
- an affirmative determination at diamond 610 may instead cause the logic to proceed to block 612 .
- the device may present a notification at user's device indicating that the microphone is currently muted.
- the device may present an audible notification such as the example notification 320 described above and/or a visual notification such as the example notification 322 /GUI 400 described above.
- the server may transmit a command to the end-user device to present the notification at the end-user device, whereas if the end-user device itself were executing block 612 it may simply control its display and/or speaker(s) to present the visual and/or audible notification, respectively.
- the logic may then proceed to block 614 .
- the device may, subsequent to presentation of the notification(s) at block 612 , receive user input to unmute the microphone via the video conferencing application, the operating system, and/or the hardware.
- the user input to unmute the microphone may be received based on selection of selector 404 or based on manipulation of the button or switch 314 to place the microphone in an unmuted mode.
- the logic may then proceed to block 616 where, responsive to receipt of the user input at block 614 , the device may unmute the microphone. Also at block 614 , the device may transmit, to the devices of the other conference participants, buffered or cached microphone data indicating audible input that was provided prior to the unmuting at block 614 consistent with the description above. Additionally or alternatively but also at block 614 , the device may transmit additional microphone data to the devices of the respective conference participants that indicates additional audible input provided by the user after the unmuting at block 614 . After block 614 the device may receive user input to mute the microphone again, and/or if desired after block 616 the logic may revert back to block 600 and proceed therefrom.
- FIG. 7 shows a block diagram of example architecture for an artificial intelligence (AI) model 700 that may be used consistent with present principles to determine, based on camera input, whether a user is currently speaking.
- AI artificial intelligence
- the AI model 700 may be used as part of the computer vision executed to make the determination of diamond 604 described above.
- the AI model 700 may be maintained in the end-user's device and/or a server in communication therewith.
- input video or images 702 from a camera may be input into an input layer of a lip localization neural network, which may be established by a convolutional neural network (CNN) having the input layer, an output layer, and multiple hidden layers between the input and output layers.
- the lip localization neural network may thus take the input video 702 as input and identify the location of lips of the mouth of a user as output from the output layer of the lip localization neural network.
- the output from the output layer of the lip localization neural network may then be provided as input to an input layer of a feature extraction neural network, which may also be established by a CNN with its own input layer, output layer, and multiple hidden layers between its input and output layers.
- the feature extraction neural network may thus take, as input, the output from the output layer of the lip localization neural network and identify features of the lips of the user at various times as output from the output layer for the feature extraction neural network.
- the output from the output layer of the feature extraction neural network may then be provided as input to an input layer of a classifier 708 that may be established at least in part by one or more long short-term memory (LSTM) recurrent neural networks (RNNs) that may have their own respective input layers, an output layers, and multiple hidden layers therebetween.
- the classifier may then use the input to its input layer to determine whether the user's lips are currently moving (e.g., in motion(s) that appear like speech) and then output the classification as data output 710 (e.g., moving or not moving, or speaking or not speaking specifically).
- the data output 710 may then be used by the device undertaking the logic of FIG.
- FIG. 8 shows an example graphical user interface (GUI) 800 that may be presented on the display of a end-user's device to configure one or more settings of the device to operate consistent with present principles.
- GUI 800 may be presented on the display 315 of the device 304 to configure one or more settings of the device related to microphone unmuting as described herein.
- Each of the options that will be described below may be selected by selecting the check box shown adjacent to the respective option through touch input, cursor input, etc.
- the GUI 800 may include a first option 802 that may be selectable to enable the device to undertake present principles.
- the option 802 may be selected to enable a setting for the device notify a user when the user is identified as currently speaking but with the microphone currently muted.
- the option 802 may be selected to configure the device to undertake the other functions described above in reference to FIGS. 3-5 , to execute the logic of FIG. 6 , and/or to use the AI model 700 as described herein.
- the option 802 may be selected to enable the device to perform automatic microphone unmuting responsive to determining that the user is currently speaking.
- the GUI 800 may also include an option 804 that may be selectable to enable the device to buffer or cache audible input at the device that might be received while the microphone is muted or in its mute mode as described herein. Still further, the GUI 800 may include a setting 806 with various associated options 808 , 810 that may be respectively selectable to present notifications audibly at the device (option 808 ) and/or visually at the device (option 810 ) as described herein.
- FIG. 9 it shows example logic that may be executed by a device consistent with present principles either independently or in conjunction with the overall logic of FIG. 6 described above.
- input from a camera 900 may be used at oval 902 to determine whether a user's presence has been detected. Responsive to a negative determination at oval 902 , the logic may proceed to oval 904 where the device may determine whether any application is using voice input from a microphone. Responsive to a negative determination at oval 904 , the logic may proceed to block 906 where the logic may end.
- the logic may instead proceed to oval 908 .
- a CNN for face landmark detection may be used to determine whether a user's face has been detected. Responsive to a negative determination at oval 908 , the logic may revert back to block 906 as described above. However, responsive to an affirmative determination at oval 908 , the logic may instead proceed to oval 910 .
- various CNNs for mouth detection may be employed to then determine at oval 912 whether mouth movement has been detected. Responsive to a negative determination at oval 912 , the logic may proceed to block 914 where the logic may end. However, responsive to an affirmative determination at oval 912 , the logic may instead proceed to oval 916 where the logic may employ an artificial intelligence model to determine if mouth movement is indicative of the user actually speaking (e.g., as opposed to merely licking his or her lips, simply opening his or her mouth, etc.).
- a negative determination at oval 916 may cause the logic to proceed to block 914 as described above. However, an affirmative determination at oval 916 may instead cause the logic to proceed to oval 918 where software and/or a driver may be used to check for whether microphone input has been muted at the hardware or application level. A negative determination at oval 918 may cause the logic to proceed to block 920 where the logic may end. However, an affirmative determination at oval 918 may instead cause the logic to proceed to oval 922 .
- the user may be notified of the hardware and/or application level mute that is detected via a device action such as presentation of a GUI (e.g., the GUI 400 of FIG. 4 ) and/or such as presentation of an audible sound or beep (or even an automated voice as illustrated by the speech bubble 320 of FIG. 3 ).
- a device action such as presentation of a GUI (e.g., the GUI 400 of FIG. 4 ) and/or such as presentation of an audible sound or beep (or even an automated voice as illustrated by the speech bubble 320 of FIG. 3 ).
- the logic may then proceed to oval 924 where a GUI option to unmute the microphone may be presented, such as presenting the selector 404 of FIG. 4 described above.
- FIG. 10 it illustrates a laptop computer 1000 operating consistent with present principles.
- the laptop 1000 determines that a user is speaking but that a hardware and/or application level microphone mute has been detected.
- the user may be notified via presentation of a GUI and/or predetermined audible beep or sound at the laptop 1000 that the microphone mute has been detected.
- One such way to do so is via the visual notification box 1002 as presented on the laptop's display and/or via presentation of an audible sound notification 1004 via its speaker(s).
- the notification box 1002 may be presented or overlaid on top of the active call/conference user interface responsive to the microphone mute being detected.
- FIG. 11 illustrates another example of a laptop computer 1100 operating consistent with present principles.
- the laptop 1100 determines that a user is speaking but that a hardware and/or application level microphone mute has been detected.
- the user may be notified via a GUI presented toward the bottom of the laptop's display and/or via a predetermined audible beep or sound that the microphone mute has been detected.
- One such way to do so is by presenting the icons 1102 , 1104 on the laptop's display, which themselves may act as a microphone status notification.
- the icons 1102 , 1104 may also establish respective selectors that are respectively selectable to unmute (icon 1102 ) or mute (icon 1104 ) the microphone. As also shown in FIG. 11 , at time T 2 the laptop 1100 may also present an audible sound notification 1106 via its speaker(s).
- buttons 1102 and 1104 are shown in FIG. 12 for further illustration. Also note that the visual notification box 1002 is shown in FIG. 13 for further illustration.
- a device operating consistent with present principles may automatically unmute a microphone as described herein responsive to determining that a user's mouth is currently moving, e.g., rather than presenting a notification (such as the GUI 400 ) that the microphone is currently muted without automatically taking the microphone off mute mode.
- the device may present a different audible or visual indication that indicates that the microphone has been automatically unmuted so that the user may be made aware. For example, the device may present a GUI with text indicating the following: “Note: Your microphone has been unmuted so that conference participants can hear you.”
- present principles may be applied in implementations other than video conferencing.
- present principles may be applied for voice-only calls, audio-video recording, voice recognition to command a digital assistant, audible input to transcribe a text message to be sent to another person, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Otolaryngology (AREA)
- Psychiatry (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Social Psychology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- User Interface Of Digital Computer (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
- The present application relates to technically inventive, non-routine solutions that are necessarily rooted in computer technology and that produce concrete technical improvements.
- As recognized herein, sometimes a person might be participating in a video conference and begin speaking without recognizing that his or her microphone is currently muted, resulting in the inability of other video conference participants to hear that person despite seeing him or her. This in turn leads to data loss and missed information. There are currently no adequate solutions to the foregoing computer-related, technological problem.
- Accordingly, in one aspect a first device includes at least one processor and storage accessible to the at least one processor. The storage includes instructions executable by the at least one processor to receive input from a camera in communication with the at least one processor and to determine, based on the input from the camera, whether a user is currently speaking. The instructions are also executable to present a notification regarding whether to unmute at least one microphone accessible to the at least one processor responsive to a determination that the user is currently speaking.
- In some examples, the first device may include both the camera and the at least one microphone. Also in some examples, the instructions may be executable to execute a computer vision algorithm to determine whether the user is currently speaking
- Still further, the first device may include a display accessible to the at least one processor, and in these examples the instructions may be executable to present the notification on the display Furthermore, in some implementations the instructions may be executable to present the notification on the display as part of a graphical user interface (GUI) responsive to the determination that the user is currently speaking, where the GUI may include a selector that is selectable to unmute the at least one microphone.
- Additionally or alternatively, the first device may include at least one speaker accessible to the at least one processor, and the instructions may be executable to present the notification audibly using the at least one speaker.
- Still further, in some implementations the instructions may be executable to, prior to presentation of the notification, determine whether the at least one microphone is currently muted. Thus, in these implementations the instructions may be executable to present the notification responsive to both the determination that the user is currently speaking and a determination that the at least one microphone is currently muted. For example, the instructions may be executable to facilitate a video conference with a second device different from the first device using a first video conferencing application, and thus the instructions may be executable to determine whether the at least one microphone is currently muted via the first video conferencing application. If desired, the instructions may be further executable to, based on a determination that the at least one microphone is not currently muted via the first video conferencing application, determine whether the at least one microphone is currently muted via an operating system executing at the first device and/or hardware accessible to the first device. The hardware may include a switch or button that is manipulable to mute and unmute the at least one microphone.
- Also in some implementations, the instructions may be executable to receive first user input to unmute the at least one microphone subsequent to presentation of the notification and, responsive to receipt of the first user input, unmute the at least one microphone and transmit data to a second device. The data may indicate second user input to the at least one microphone that may include audible input.
- In another aspect, a method includes receiving input from a camera and determining, based on the input from the camera, whether a user is currently speaking. The method also includes, responsive to determining that the user is currently speaking, issuing a command to present a notification regarding whether to unmute at least one microphone accessible to a first device.
- In some implementations, the method may be performed by a server in communication with the first device, and the command may be issued by transmitting the command to the first device. Also in some implementations, the method may be performed by an end-user device that establishes the first device, and the command may be issued by controlling an electronic display accessible to the end-user device to present the notification.
- The notification itself may be presented on a display accessible to the first device as part of a graphical user interface (GUI), where the GUI may include a selector that is selectable to unmute the at least one microphone.
- Still further, in some examples the method may include determining whether the at least one microphone is currently muted prior to issuing the command, and then issuing the command responsive to both determining that the user is currently speaking and determining that the at least one microphone is currently muted.
- In another aspect, at least one computer readable storage medium (CRSM) that is not a transitory signal includes instructions executable by at least one processor to receive input from a camera in communication with the at least one processor and to determine, based on the input from the camera, that a user is speaking. The instructions are also executable to, based on the determination that the user is speaking, present a graphical user interface (GUI) on a display accessible to the at least one processor. The GUI includes an indication that at least one microphone accessible to the at least one processor is in a mute mode. In some examples, the GUI may also include a selector that is selectable to take the at least one microphone out of the mute mode.
- The details of present principles, both as to their structure and operation, can best be understood in reference to the accompanying drawings, in which like reference numerals refer to like parts, and in which:
-
FIG. 1 is a block diagram of an example system consistent with present principles; -
FIG. 2 is a block diagram of an example network of devices consistent with present principles; -
FIGS. 3, 10, and 11 show example illustrations consistent with present principles; -
FIGS. 4 and 5 show example graphical user interfaces (GUIs) that may be presented based on determining that a user is speaking consistent with present principles; -
FIGS. 6 and 9 show flow charts of example algorithms consistent with present principles; -
FIG. 7 shows example artificial intelligence architecture that may be used consistent with present principles; -
FIG. 8 shows an example GUI for configuring one or more settings of a device to undertake present principles; and -
FIGS. 12 and 13 show example notifications that may be presented based on determining that a user is speaking consistent with present principles. - Among other things, the present application discloses systems and methods to use computer vision and artificial intelligence (AI) during video conferencing to detect if a user in front of a device's camera appears to be speaking by detecting specific movements of the mouth. When the AI detects the user is speaking, the system and application mute settings may be accessed by the device to verify whether the microphone is set to “off mute”. If any of the settings are set to microphone mute being on, the user may then be notified that his or her microphone or device is set to audio input mute. The user may then determine if he or she wants to go off mute at that point, and/or the device may automatically set itself to go off mute.
- Prior to delving further into the details of the instant techniques, note with respect to any computer systems discussed herein that a system may include server and client components, connected over a network such that data may be exchanged between the client and server components. The client components may include one or more computing devices including televisions (e.g., smart TVs, Internet-enabled TVs), computers such as desktops, laptops and tablet computers, so-called convertible devices (e.g., having a tablet configuration and laptop configuration), and other mobile devices including smart phones. These client devices may employ, as non-limiting examples, operating systems from Apple Inc. of Cupertino Calif., Google Inc. of Mountain View, Calif., or Microsoft Corp. of Redmond, Wash. A Unix® or similar such as Linux® operating system may be used. These operating systems can execute one or more browsers such as a browser made by Microsoft or Google or Mozilla or another browser program that can access web pages and applications hosted by Internet servers over a network such as the Internet, a local intranet, or a virtual private network.
- As used herein, instructions refer to computer-implemented steps for processing information in the system. Instructions can be implemented in software, firmware or hardware, or combinations thereof and include any type of programmed step undertaken by components of the system; hence, illustrative components, blocks, modules, circuits, and steps are sometimes set forth in terms of their functionality.
- A processor may be any general purpose single- or multi-chip processor that can execute logic by means of various lines such as address lines, data lines, and control lines and registers and shift registers. Moreover, any logical blocks, modules, and circuits described herein can be implemented or performed with a general purpose processor, a digital signal processor (DSP), a field programmable gate array (FPGA) or other programmable logic device such as an application specific integrated circuit (ASIC), discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A processor can also be implemented by a controller or state machine or a combination of computing devices. Thus, the methods herein may be implemented as software instructions executed by a processor, suitably configured application specific integrated circuits (ASIC) or field programmable gate array (FPGA) modules, or any other convenient manner as would be appreciated by those skilled in those art. Where employed, the software instructions may also be embodied in a non-transitory device that is being vended and/or provided that is not a transitory, propagating signal and/or a signal per se (such as a hard disk drive, CD ROM or Flash drive). The software code instructions may also be downloaded over the Internet. Accordingly, it is to be understood that although a software application for undertaking present principles may be vended with a device such as the system 100 described below, such an application may also be downloaded from a server to a device over a network such as the Internet.
- Software modules and/or applications described by way of flow charts and/or user interfaces herein can include various sub-routines, procedures, etc. Without limiting the disclosure, logic stated to be executed by a particular module can be redistributed to other software modules and/or combined together in a single module and/or made available in a shareable library.
- Logic when implemented in software, can be written in an appropriate language such as but not limited to C# or C++, and can be stored on or transmitted through a computer-readable storage medium (that is not a transitory, propagating signal per se) such as a random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), compact disk read-only memory (CD-ROM) or other optical disk storage such as digital versatile disc (DVD), magnetic disk storage or other magnetic storage devices including removable thumb drives, etc.
- In an example, a processor can access information over its input lines from data storage, such as the computer readable storage medium, and/or the processor can access information wirelessly from an Internet server by activating a wireless transceiver to send and receive data. Data typically is converted from analog signals to digital by circuitry between the antenna and the registers of the processor when being received and from digital to analog when being transmitted. The processor then processes the data through its shift registers to output calculated data on output lines, for presentation of the calculated data on the device.
- Components included in one embodiment can be used in other embodiments in any appropriate combination. For example, any of the various components described herein and/or depicted in the Figures may be combined, interchanged or excluded from other embodiments.
- “A system having at least one of A, B, and C” (likewise “a system having at least one of A, B, or C” and “a system having at least one of A, B, C”) includes systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.
- The term “circuit” or “circuitry” may be used in the summary, description, and/or claims. As is well known in the art, the term “circuitry” includes all levels of available integration, e.g., from discrete logic circuits to the highest level of circuit integration such as VLSI, and includes programmable logic components programmed to perform the functions of an embodiment as well as general-purpose or special-purpose processors programmed with instructions to perform those functions.
- Now specifically in reference to
FIG. 1 , an example block diagram of an information handling system and/or computer system 100 is shown that is understood to have a housing for the components described below. Note that in some embodiments the system 100 may be a desktop computer system, such as one of the ThinkCentre® or ThinkPad® series of personal computers sold by Lenovo (US) Inc. of Morrisville, N.C., or a workstation computer, such as the ThinkStation®, which are sold by Lenovo (US) Inc. of Morrisville, N.C.; however, as apparent from the description herein, a client device, a server or other machine in accordance with present principles may include other features or only some of the features of the system 100. Also, the system 100 may be, e.g., a game console such as XBOX®, and/or the system 100 may include a mobile communication device such as a mobile telephone, notebook computer, and/or other portable computerized device. - As shown in
FIG. 1 , the system 100 may include a so-called chipset 110. A chipset refers to a group of integrated circuits, or chips, that are designed to work together. Chipsets are usually marketed as a single product (e.g., consider chipsets marketed under the brands INTEL®, AMD®, etc.). - In the example of
FIG. 1 , the chipset 110 has a particular architecture, which may vary to some extent depending on brand or manufacturer. The architecture of the chipset 110 includes a core andmemory control group 120 and an I/O controller hub 150 that exchange information (e.g., data, signals, commands, etc.) via, for example, a direct management interface or direct media interface (DMI) 142 or alink controller 144. In the example ofFIG. 1 , theDMI 142 is a chip-to-chip interface (sometimes referred to as being a link between a “northbridge” and a “southbridge”). - The core and
memory control group 120 include one or more processors 122 (e.g., single core or multi-core, etc.) and a memory controller hub 126 that exchange information via a front side bus (FSB) 124. As described herein, various components of the core andmemory control group 120 may be integrated onto a single processor die, for example, to make a chip that supplants the “northbridge” style architecture. - The memory controller hub 126 interfaces with
memory 140. For example, the memory controller hub 126 may provide support for DDR SDRAM memory (e.g., DDR, DDR2, DDR3, etc.). In general, thememory 140 is a type of random-access memory (RAM). It is often referred to as “system memory.” - The memory controller hub 126 can further include a low-voltage differential signaling interface (LVDS) 132. The
LVDS 132 may be a so-called LVDS Display Interface (LDI) for support of a display device 192 (e.g., a CRT, a flat panel, a projector, a touch-enabled light emitting diode display or other video display, etc.). Ablock 138 includes some examples of technologies that may be supported via the LVDS interface 132 (e.g., serial digital video, HDMI/DVI, display port). The memory controller hub 126 also includes one or more PCI-express interfaces (PCI-E) 134, for example, for support ofdiscrete graphics 136. Discrete graphics using a PCI-E interface has become an alternative approach to an accelerated graphics port (AGP). For example, the memory controller hub 126 may include a 16-lane (x16) PCI-E port for an external PCI-E-based graphics card (including, e.g., one of more GPUs). An example system may include AGP or PCI-E for support of graphics. - In examples in which it is used, the I/
O hub controller 150 can include a variety of interfaces. The example ofFIG. 1 includes aSATA interface 151, one or more PCI-E interfaces 152 (optionally one or more legacy PCI interfaces), one ormore USB interfaces 153, a LAN interface 154 (more generally a network interface for communication over at least one network such as the Internet, a WAN, a LAN, etc. under direction of the processor(s) 122), a general purpose I/O interface (GPIO) 155, a low-pin count (LPC)interface 170, apower management interface 161, aclock generator interface 162, an audio interface 163 (e.g., forspeakers 194 to output audio), a total cost of operation (TCO)interface 164, a system management bus interface (e.g., a multi-master serial computer bus interface) 165, and a serial peripheral flash memory/controller interface (SPI Flash) 166, which, in the example ofFIG. 1 , includes BIOS 168 andboot code 190. With respect to network connections, the I/O hub controller 150 may include integrated gigabit Ethernet controller lines multiplexed with a PCI-E interface port. Other network features may operate independent of a PCI-E interface. - The interfaces of the I/
O hub controller 150 may provide for communication with various devices, networks, etc. For example, where used, theSATA interface 151 provides for reading, writing or reading and writing information on one ormore drives 180 such as HDDs, SDDs or a combination thereof, but in any case thedrives 180 are understood to be, e.g., tangible computer readable storage mediums that are not transitory, propagating signals. The I/O hub controller 150 may also include an advanced host controller interface (AHCI) to support one or more drives 180. The PCI-E interface 152 allows forwireless connections 182 to devices, networks, etc. TheUSB interface 153 provides forinput devices 184 such as keyboards (KB), mice and various other devices (e.g., cameras, phones, storage, media players, etc.). - In the example of
FIG. 1 , theLPC interface 170 provides for use of one ormore ASICs 171, a trusted platform module (TPM) 172, a super I/O 173, afirmware hub 174, BIOS support 175 as well as various types ofmemory 176 such asROM 177,Flash 178, and non-volatile RAM (NVRAM) 179. With respect to theTPM 172, this module may be in the form of a chip that can be used to authenticate software and hardware devices. For example, a TPM may be capable of performing platform authentication and may be used to verify that a system seeking access is the expected system. - The system 100, upon power on, may be configured to execute
boot code 190 for the BIOS 168, as stored within theSPI Flash 166, and thereafter processes data under the control of one or more operating systems and application software (e.g., stored in system memory 140). An operating system may be stored in any of a variety of locations and accessed, for example, according to instructions of the BIOS 168. - Additionally, the system 100 may include at least one microphone or a
microphone array 193 that may provide input from the microphone/array 193 to the processor 122 based on audio that is detected, such as via a user providing audible input to the microphone/array 193 consistent with present principles. The system 100 may also include at least onecamera 191 that may gather one or more images and provide the images to the processor 122. Thecamera 191 may be a thermal imaging camera, an infrared (IR) camera, a digital camera such as a webcam, a three-dimensional (3D) camera, and/or a camera otherwise integrated into the system 100 and controllable by the processor 122 to gather pictures/images and/or video. - Still further, though not shown for simplicity in some embodiments the system 100 may include a gyroscope that senses and/or measures the orientation of the system 100 and provides input related thereto to the processor 122, as well as an accelerometer that senses acceleration and/or movement of the system 100 and provides input related thereto to the processor 122. Also, the system 100 may include a GPS transceiver that is configured to communicate with at least one satellite to receive/identify geographic position information and provide the geographic position information to the processor 122. However, it is to be understood that another suitable position receiver other than a GPS receiver may be used in accordance with present principles to determine the location of the system 100.
- It is to be understood that an example client device or other machine/computer may include fewer or more features than shown on the system 100 of
FIG. 1 . In any case, it is to be understood at least based on the foregoing that the system 100 is configured to undertake present principles. - Turning now to
FIG. 2 , example devices are shown communicating over anetwork 200 such as the Internet in accordance with present principles, e.g., for video conferencing as described herein. It is to be understood that each of the devices described in reference toFIG. 2 may include at least some of the features, components, and/or elements of the system 100 described above. Indeed, any of the devices disclosed herein may include at least some of the features, components, and/or elements of the system 100 described above. -
FIG. 2 shows a notebook computer and/orconvertible computer 202, adesktop computer 204, awearable device 206 such as a smart watch, a smart television (TV) 208, asmart phone 210, atablet computer 212, and aserver 214 such as an Internet server that may provide cloud storage accessible to the devices 202-212. It is to be understood that the devices 202-214 are configured to communicate with each other over thenetwork 200 to undertake present principles. - Now describing
FIG. 3 , it shows anexample illustration 300 consistent with present principles. Specifically, theillustration 300 depicts auser 302 participating in a video conference that is facilitated through an end-user device 304 such as a laptop computer, desktop computer, tablet computer, smart phone, etc. Thedevice 304 may facilitate the video conference by executing a video conferencing application locally at thedevice 304, with other devices of otherremote participants device 304. The video conferencing application may be, for example Skype, Apple's Facetime, a Google Gchat video conference, etc. - The
device 304 may have a built-inmicrophone 310 for receiving audible input from theuser 302 to then transmit that input to the other respective devices for theremote participants device 304 may also communicate with additional hardware such as a wireless, stand-alone microphone 312 that the user might be using to provide audible input that thedevice 304 may then transmit to the other devices. In some examples, themicrophone 312 may include a hardware switch ordepressable button 314 that may be manipulable between on and off positions to respectively mute and unmute themicrophone 312 so that, when muted, themicrophone 312 does not transmit audible input it detects to thedevice 304 while themicrophone 312 still remains powered on, or does not receive the audible input altogether (e.g., is turned off). Then when unmuted, themicrophone 312 may receive and transmit audible input to thedevice 304 via Bluetooth or another communication protocol, and thedevice 304 may then relay the audible input to the respective devices of theother participants - Additionally, a
mute selector 313 presented on a touch-enableddisplay 315 of thedevice 304 may also be selected and deselected with touch input to respectively mute and unmute themicrophone 310 via the video conferencing application itself. For example, the software mute through the video conferencing application may involve thedevice 304 receiving audible input via one of themicrophones device 304, but not actually transmitting any voice data corresponding to the audible input to the respective devices of theother participants - Also note that a camera 316 is shown as being disposed on the
device 304. The camera 316 may gather images of the scene within its field of view, which in this case includes the face of theuser 302. The camera 316 may then relay those images to the respective devices of theother participants user 302 andother participants - As depicted in
FIG. 3 , when theuser 302 begins speaking as illustrated byspeech bubble 318, the device and/or a server in communication with the device may determine that theuser 302 is currently speaking by using input from the camera 316. Based on that determination, a command may be issued by the server and/or the local processor on the device 304 (e.g., a central processing unit (CPU)) to present one or more notifications indicating that whatever microphone(s) is being used for the video conference (themicrophones 310 and/or 312) is currently muted/in mute mode. - For example, an
audible notification 320 may be presented via a speaker on the device that says, “Your microphone is muted!” As another example, avisual notification 322 may be presented on thedisplay 315 as shown so that thenotification 322 is presented over top of other visual portions of the video conference, though in some examples thenotification 322 may be presented to take up the full display space rather than a portion thereof. - The
visual notification 322 is shown in more detail inFIG. 4 . As shown, thenotification 322 may be presented as part of a graphical user interface (GUI) 400 presented on the touch-enabled display of thedevice 304. As also shown, theGUI 400 may includetext 402 indicating that the device has determined that the user is currently speaking but that the microphone being used for video conferencing is currently in a mute mode/muted. For example, thetext 402 may indicate the following: “Are you trying to speak to other conference participants? Your microphone is currently in mute mode.” - As also shown, the
GUI 400 may include aselector 404 that may be selectable to command thedevice 304 to take the microphone off of mute mode and/or otherwise unmute the microphone at the application level, operating system level, etc. In some examples, theGUI 400 may even include aselector 406 that may be selectable to provide input indicating that theuser 302 is not trying to speak to conference participants, with the input then being used to train an artificial neural network using machine learning to make improved determinations of the user speaking to conference participants in the future. - Note that responsive to the
selector 404 being selected, in some examples thedevice 304 may begin streaming or otherwise transmitting, to the devices of the other conference participants, the user's audible input from that point forward as the user provides it to the microphone after selecting theselector 404. - However, in other examples even though the mute mode/muting may have been turned on prior to selection of the
selector 404 so that audible input to the microphone was not transmitted/routed to other conference participants when spoken by the user even with the microphone powered on, thedevice 304 may have still been caching or storing the audible input during that time. The audible input may have been locally cached in RAM of thedevice 304, and/or it may even have been streamed to and cached at a remotely-located server that is facilitating communication among the participants' devices for the video conference. Note that in some implementations, only a threshold amount of most-recent input (e.g., the last thirty seconds) may be cached in RAM and/or at the server. - Then, when the user selects the
selector 404, thedevice 304 or server may transmit the cached audible input to the other conference participants' devices. In so doing, thedevice 304 and/or server may help ensure that although the mute mode was enabled while the cached audible input was spoken, it may still be provided to the other participants at a later time than when spoken rather than simply being lost, which would otherwise result in theuser 302 having to re-speak what was already spoken or simply moving on to other speech to the detriment of the other conference participants. - Thus, in situations where cached audible input is to be provided to the other conference participants' devices responsive to selection of the
selector 404, theGUI 500 ofFIG. 5 may be presented on the touch-enabled display of thedevice 304. However, also note that theGUI 500 might be presented on the touch-enabled display responsive to automatic microphone unmuting rather than selection of theselector 404, as might have occurred in other example implementations based on thedevice 304 detecting the user as currently speaking. - In any case, the
GUI 500 may include anon-text icon 502 andtext 504 indicating that the microphone has been unmuted (or otherwise taken out of mute mode). TheGUI 500 may also includetext 506 instructing the user to wait before speaking any additional input to the microphone so that previously cached audible input form the user can be transmitted to the other conference participants and heard by them via their own respective devices before the user provides additional audible input. In some examples, theGUI 500 may even include aselector 508 to again mute the microphone or otherwise place it back in mute mode, e.g., after the user is done speaking what he or she had to say. - Now referring to
FIG. 6 , it shows example logic that may be executed by a device such as thedevice 304 and/or the system 100 consistent with present principles. However, also note that in some examples some or all of the logic steps ofFIG. 6 may be performed by a remotely-located server in communication with the device, such as the same server that might be used to replay audio/video communications between participants of a video conference consistent with present principles. - Beginning at
block 600, the device may facilitate a video conference with other devices, e.g., using a video conferencing application. For example, the device may launch the video conferencing application and/or initiate the video conference itself so that respective audible input and camera video from respective participant devices may be transmitted to the other participants in real time. Fromblock 600 the logic may proceed to block 602. - At
block 602 the device may receive input from a camera in communication with the device, such as its built-in webcam. The logic may then proceed todecision diamond 604 where the device may determine whether a user (such as the user 302) is currently speaking as indicated in the input from the camera. To make the determination atdiamond 604, the device may execute a computer vision algorithm, for example. The computer vision algorithm may include, for example, a lip reading or movement algorithm, a gesture recognition algorithm, a facial recognition algorithm, etc. Additionally, note that in some examples the computer vision algorithm may make use of one or more artificial neural networks of an artificial intelligence model that may be used to determine whether the user is currently speaking based on the input from the camera. Example architecture for such a model will be described below in reference toFIG. 7 . - If the device makes a negative determination at
diamond 604, the logic may revert back to block 600 and proceed therefrom. However, if the device makes an affirmative determination atdiamond 604, the logic may instead proceed to block 606 (or in some examples, directly to decision diamond 608). Atblock 606 the device may begin buffering or caching spoken input to the device's microphone as described above. From there the logic may proceed todecision diamond 608. - At
diamond 608 the device may determine whether the microphone is currently muted via a mute mode controlled by the video conferencing application itself so that audio detected by the microphone is not provided to conference participants even if it is buffered/cached locally at the device (e.g., software mute rather than turning the microphone off). An affirmative determination atdiamond 608 may cause the logic to proceed to block 612, which will be described shortly. However, first note that a negative determination atdiamond 608 may instead cause the logic to proceed todecision diamond 610. - At
diamond 610 the device may determine whether the microphone is currently muted via an operating system executing at the device itself (e.g., the device 304) and/or currently muted via hardware accessible to the device (e.g., muted via the button orswitch 314, or the microphone being turned off/powered down altogether). The operating system may be, for example, the device's basic input/output system (BIOS) or a guest operating system such as Microsoft's Windows, Apple's Mac OS, Linux, etc. Thus, for example, the determination atdiamond 610 may include whether the microphone has been muted or a mute mode entered via a “global” microphone mute command from the user to the operating system itself (rather than to the video conferencing application specifically) so that the microphone is muted for all functions that might be executed by the operating system using the microphone independent of the video conference itself. - A negative determination at
diamond 610 may cause the logic to revert back to block 600 where the device may continue facilitating the video conference and transmit data indicating the audible input from the user to the microphone to other conference participants consistent with present principles owing to the microphone being determined to not be muted on any of the levels discussed above (e.g., application level, operating system level, or via hardware). - However, note that an affirmative determination at
diamond 610 may instead cause the logic to proceed to block 612. Atblock 612 the device may present a notification at user's device indicating that the microphone is currently muted. For example, atblock 612 the device may present an audible notification such as theexample notification 320 described above and/or a visual notification such as theexample notification 322/GUI 400 described above. Note that ifblock 612 is executed by a server rather than the end-user's device, the server may transmit a command to the end-user device to present the notification at the end-user device, whereas if the end-user device itself were executingblock 612 it may simply control its display and/or speaker(s) to present the visual and/or audible notification, respectively. - From
block 612 the logic may then proceed to block 614. Atblock 614 the device may, subsequent to presentation of the notification(s) atblock 612, receive user input to unmute the microphone via the video conferencing application, the operating system, and/or the hardware. For example, the user input to unmute the microphone may be received based on selection ofselector 404 or based on manipulation of the button or switch 314 to place the microphone in an unmuted mode. - From
block 614 the logic may then proceed to block 616 where, responsive to receipt of the user input atblock 614, the device may unmute the microphone. Also atblock 614, the device may transmit, to the devices of the other conference participants, buffered or cached microphone data indicating audible input that was provided prior to the unmuting atblock 614 consistent with the description above. Additionally or alternatively but also atblock 614, the device may transmit additional microphone data to the devices of the respective conference participants that indicates additional audible input provided by the user after the unmuting atblock 614. Afterblock 614 the device may receive user input to mute the microphone again, and/or if desired afterblock 616 the logic may revert back to block 600 and proceed therefrom. - Now describing
FIG. 7 , it shows a block diagram of example architecture for an artificial intelligence (AI) model 700 that may be used consistent with present principles to determine, based on camera input, whether a user is currently speaking. For example, the AI model 700 may be used as part of the computer vision executed to make the determination ofdiamond 604 described above. Thus, the AI model 700 may be maintained in the end-user's device and/or a server in communication therewith. - As shown in
FIG. 7 , input video orimages 702 from a camera may be input into an input layer of a lip localization neural network, which may be established by a convolutional neural network (CNN) having the input layer, an output layer, and multiple hidden layers between the input and output layers. The lip localization neural network may thus take theinput video 702 as input and identify the location of lips of the mouth of a user as output from the output layer of the lip localization neural network. - The output from the output layer of the lip localization neural network may then be provided as input to an input layer of a feature extraction neural network, which may also be established by a CNN with its own input layer, output layer, and multiple hidden layers between its input and output layers. The feature extraction neural network may thus take, as input, the output from the output layer of the lip localization neural network and identify features of the lips of the user at various times as output from the output layer for the feature extraction neural network.
- The output from the output layer of the feature extraction neural network may then be provided as input to an input layer of a
classifier 708 that may be established at least in part by one or more long short-term memory (LSTM) recurrent neural networks (RNNs) that may have their own respective input layers, an output layers, and multiple hidden layers therebetween. The classifier may then use the input to its input layer to determine whether the user's lips are currently moving (e.g., in motion(s) that appear like speech) and then output the classification as data output 710 (e.g., moving or not moving, or speaking or not speaking specifically). Thedata output 710 may then be used by the device undertaking the logic ofFIG. 6 to determine whether the is currently speaking atdiamond 604 based on whether the user's lips are moving (currently speaking) or not moving (not currently speaking), or to determine whether the is currently speaking atdiamond 604 based on theoutput 710 itself if the classification that is output is specifically speaking or not speaking. - Now describing
FIG. 8 , it shows an example graphical user interface (GUI) 800 that may be presented on the display of a end-user's device to configure one or more settings of the device to operate consistent with present principles. For example, theGUI 800 may be presented on thedisplay 315 of thedevice 304 to configure one or more settings of the device related to microphone unmuting as described herein. Each of the options that will be described below may be selected by selecting the check box shown adjacent to the respective option through touch input, cursor input, etc. - As shown, the
GUI 800 may include afirst option 802 that may be selectable to enable the device to undertake present principles. For example, theoption 802 may be selected to enable a setting for the device notify a user when the user is identified as currently speaking but with the microphone currently muted. - Additionally or alternatively, the
option 802 may be selected to configure the device to undertake the other functions described above in reference toFIGS. 3-5 , to execute the logic ofFIG. 6 , and/or to use the AI model 700 as described herein. For example, theoption 802 may be selected to enable the device to perform automatic microphone unmuting responsive to determining that the user is currently speaking. - The
GUI 800 may also include an option 804 that may be selectable to enable the device to buffer or cache audible input at the device that might be received while the microphone is muted or in its mute mode as described herein. Still further, theGUI 800 may include a setting 806 with various associated options 808, 810 that may be respectively selectable to present notifications audibly at the device (option 808) and/or visually at the device (option 810) as described herein. - Moving on to
FIG. 9 , it shows example logic that may be executed by a device consistent with present principles either independently or in conjunction with the overall logic ofFIG. 6 described above. As shown, input from acamera 900 may be used at oval 902 to determine whether a user's presence has been detected. Responsive to a negative determination atoval 902, the logic may proceed to oval 904 where the device may determine whether any application is using voice input from a microphone. Responsive to a negative determination atoval 904, the logic may proceed to block 906 where the logic may end. - However, responsive to an affirmative determination at either of oval 902 or oval 904, the logic may instead proceed to
oval 908. At oval 908 a CNN for face landmark detection may be used to determine whether a user's face has been detected. Responsive to a negative determination atoval 908, the logic may revert back to block 906 as described above. However, responsive to an affirmative determination atoval 908, the logic may instead proceed tooval 910. - At
oval 910 various CNNs for mouth detection may be employed to then determine at oval 912 whether mouth movement has been detected. Responsive to a negative determination atoval 912, the logic may proceed to block 914 where the logic may end. However, responsive to an affirmative determination atoval 912, the logic may instead proceed to oval 916 where the logic may employ an artificial intelligence model to determine if mouth movement is indicative of the user actually speaking (e.g., as opposed to merely licking his or her lips, simply opening his or her mouth, etc.). - A negative determination at
oval 916 may cause the logic to proceed to block 914 as described above. However, an affirmative determination atoval 916 may instead cause the logic to proceed to oval 918 where software and/or a driver may be used to check for whether microphone input has been muted at the hardware or application level. A negative determination atoval 918 may cause the logic to proceed to block 920 where the logic may end. However, an affirmative determination atoval 918 may instead cause the logic to proceed tooval 922. - At
oval 922 the user may be notified of the hardware and/or application level mute that is detected via a device action such as presentation of a GUI (e.g., theGUI 400 ofFIG. 4 ) and/or such as presentation of an audible sound or beep (or even an automated voice as illustrated by thespeech bubble 320 ofFIG. 3 ). Fromoval 922 the logic may then proceed to oval 924 where a GUI option to unmute the microphone may be presented, such as presenting theselector 404 ofFIG. 4 described above. - Now describing
FIG. 10 , it illustrates alaptop computer 1000 operating consistent with present principles. As shown, at a first time T1 thelaptop 1000 determines that a user is speaking but that a hardware and/or application level microphone mute has been detected. Thus, at a later time T2 the user may be notified via presentation of a GUI and/or predetermined audible beep or sound at thelaptop 1000 that the microphone mute has been detected. One such way to do so is via thevisual notification box 1002 as presented on the laptop's display and/or via presentation of an audible sound notification 1004 via its speaker(s). Also note that thenotification box 1002 may be presented or overlaid on top of the active call/conference user interface responsive to the microphone mute being detected. -
FIG. 11 illustrates another example of alaptop computer 1100 operating consistent with present principles. As shown, at a first time T1 thelaptop 1100 determines that a user is speaking but that a hardware and/or application level microphone mute has been detected. Thus, at a later time T2 the user may be notified via a GUI presented toward the bottom of the laptop's display and/or via a predetermined audible beep or sound that the microphone mute has been detected. One such way to do so is by presenting theicons icons FIG. 11 , at time T2 thelaptop 1100 may also present anaudible sound notification 1106 via its speaker(s). - Note that the
icons FIG. 12 for further illustration. Also note that thevisual notification box 1002 is shown inFIG. 13 for further illustration. - Before concluding, note that in some examples a device operating consistent with present principles may automatically unmute a microphone as described herein responsive to determining that a user's mouth is currently moving, e.g., rather than presenting a notification (such as the GUI 400) that the microphone is currently muted without automatically taking the microphone off mute mode. In these examples, based on the automatic unmuting, the device may present a different audible or visual indication that indicates that the microphone has been automatically unmuted so that the user may be made aware. For example, the device may present a GUI with text indicating the following: “Note: Your microphone has been unmuted so that conference participants can hear you.”
- Also before concluding, it is to be understood that present principles may be applied in implementations other than video conferencing. For example, present principles may be applied for voice-only calls, audio-video recording, voice recognition to command a digital assistant, audible input to transcribe a text message to be sent to another person, etc.
- It may now be appreciated that present principles provide for an improved computer-based user interface that improves the functionality and ease of use of the devices disclosed herein. The disclosed concepts are rooted in computer technology for computers to carry out their functions.
- It is to be understood that whilst present principals have been described with reference to some example embodiments, these are not intended to be limiting, and that various alternative arrangements may be used to implement the subject matter claimed herein. Components included in one embodiment can be used in other embodiments in any appropriate combination. For example, any of the various components described herein and/or depicted in the Figures may be combined, interchanged or excluded from other embodiments.
Claims (24)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/727,836 US11049511B1 (en) | 2019-12-26 | 2019-12-26 | Systems and methods to determine whether to unmute microphone based on camera input |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/727,836 US11049511B1 (en) | 2019-12-26 | 2019-12-26 | Systems and methods to determine whether to unmute microphone based on camera input |
Publications (2)
Publication Number | Publication Date |
---|---|
US11049511B1 US11049511B1 (en) | 2021-06-29 |
US20210201935A1 true US20210201935A1 (en) | 2021-07-01 |
Family
ID=76546494
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/727,836 Active US11049511B1 (en) | 2019-12-26 | 2019-12-26 | Systems and methods to determine whether to unmute microphone based on camera input |
Country Status (1)
Country | Link |
---|---|
US (1) | US11049511B1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11405584B1 (en) * | 2021-03-25 | 2022-08-02 | Plantronics, Inc. | Smart audio muting in a videoconferencing system |
US11416831B2 (en) * | 2020-05-21 | 2022-08-16 | HUDDL Inc. | Dynamic video layout in video conference meeting |
US11509493B1 (en) | 2021-06-14 | 2022-11-22 | Motorola Mobility Llc | Electronic device that enables host toggling of presenters from among multiple remote participants in a communication session |
US20220398064A1 (en) * | 2021-06-14 | 2022-12-15 | Motorola Mobility Llc | Electronic device with imaging based mute control |
US11743065B2 (en) | 2021-06-14 | 2023-08-29 | Motorola Mobility Llc | Electronic device that visually monitors hand and mouth movements captured by a muted device of a remote participant in a video communication session |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2022107950A (en) * | 2021-01-12 | 2022-07-25 | 富士フイルムビジネスイノベーション株式会社 | Information processing device and program |
CN113873195B (en) * | 2021-08-18 | 2023-04-18 | 荣耀终端有限公司 | Video conference control method, device and storage medium |
Family Cites Families (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8639214B1 (en) * | 2007-10-26 | 2014-01-28 | Iwao Fujisaki | Communication device |
US8649494B2 (en) * | 2008-08-05 | 2014-02-11 | International Business Machines Corporation | Participant alerts during multi-person teleconferences |
US7848511B2 (en) * | 2008-09-30 | 2010-12-07 | Avaya Inc. | Telecommunications-terminal mute detection |
US20140229866A1 (en) * | 2008-11-24 | 2014-08-14 | Shindig, Inc. | Systems and methods for grouping participants of multi-user events |
US8488745B2 (en) * | 2009-06-17 | 2013-07-16 | Microsoft Corporation | Endpoint echo detection |
US8620653B2 (en) * | 2009-06-18 | 2013-12-31 | Microsoft Corporation | Mute control in audio endpoints |
CN102404131B (en) * | 2010-07-29 | 2015-03-25 | 株式会社理光 | Communication terminal, communication management system, communication system, and sound input part registering method |
US8878678B2 (en) * | 2012-05-29 | 2014-11-04 | Cisco Technology, Inc. | Method and apparatus for providing an intelligent mute status reminder for an active speaker in a conference |
US9595271B2 (en) * | 2013-06-27 | 2017-03-14 | Getgo, Inc. | Computer system employing speech recognition for detection of non-speech audio |
US9071692B2 (en) * | 2013-09-25 | 2015-06-30 | Dell Products L.P. | Systems and methods for managing teleconference participant mute state |
US9215543B2 (en) * | 2013-12-03 | 2015-12-15 | Cisco Technology, Inc. | Microphone mute/unmute notification |
US9940944B2 (en) * | 2014-08-19 | 2018-04-10 | Qualcomm Incorporated | Smart mute for a communication device |
US20170006395A1 (en) * | 2015-06-30 | 2017-01-05 | Motorola Solutions, Inc. | Bendable microphone |
US20170171286A1 (en) * | 2015-12-15 | 2017-06-15 | Videodesk Sa | Methods and devices for validating a video connection or other types of communication sessions over a computer network |
US11282537B2 (en) * | 2017-06-09 | 2022-03-22 | International Business Machines Corporation | Active speaker detection in electronic meetings for providing video from one device to plurality of other devices |
US9967520B1 (en) * | 2017-06-30 | 2018-05-08 | Ringcentral, Inc. | Method and system for enhanced conference management |
US20190014410A1 (en) * | 2017-07-07 | 2019-01-10 | Qualcomm Incorporated | Notifying of a mismatch between an audio jack and an audio socket |
US10701470B2 (en) * | 2017-09-07 | 2020-06-30 | Light Speed Aviation, Inc. | Circumaural headset or headphones with adjustable biometric sensor |
CN110663021B (en) * | 2017-11-06 | 2024-02-02 | 谷歌有限责任公司 | Method and system for paying attention to presence subscribers |
US10776073B2 (en) * | 2018-10-08 | 2020-09-15 | Nuance Communications, Inc. | System and method for managing a mute button setting for a conference call |
US10652679B1 (en) * | 2019-02-28 | 2020-05-12 | International Business Machines Corporation | Undesirable noise detection and management |
-
2019
- 2019-12-26 US US16/727,836 patent/US11049511B1/en active Active
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11416831B2 (en) * | 2020-05-21 | 2022-08-16 | HUDDL Inc. | Dynamic video layout in video conference meeting |
US11488116B2 (en) | 2020-05-21 | 2022-11-01 | HUDDL Inc. | Dynamically generated news feed |
US11537998B2 (en) | 2020-05-21 | 2022-12-27 | HUDDL Inc. | Capturing meeting snippets |
US11405584B1 (en) * | 2021-03-25 | 2022-08-02 | Plantronics, Inc. | Smart audio muting in a videoconferencing system |
US11509493B1 (en) | 2021-06-14 | 2022-11-22 | Motorola Mobility Llc | Electronic device that enables host toggling of presenters from among multiple remote participants in a communication session |
US20220398064A1 (en) * | 2021-06-14 | 2022-12-15 | Motorola Mobility Llc | Electronic device with imaging based mute control |
US11604623B2 (en) * | 2021-06-14 | 2023-03-14 | Motorola Mobility Llc | Electronic device with imaging based mute control |
US11743065B2 (en) | 2021-06-14 | 2023-08-29 | Motorola Mobility Llc | Electronic device that visually monitors hand and mouth movements captured by a muted device of a remote participant in a video communication session |
Also Published As
Publication number | Publication date |
---|---|
US11049511B1 (en) | 2021-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11049511B1 (en) | Systems and methods to determine whether to unmute microphone based on camera input | |
US11196869B2 (en) | Facilitation of two or more video conferences concurrently | |
US10103699B2 (en) | Automatically adjusting a volume of a speaker of a device based on an amplitude of voice input to the device | |
US11184560B1 (en) | Use of sensor input to determine video feed to provide as part of video conference | |
US10922862B2 (en) | Presentation of content on headset display based on one or more condition(s) | |
US20160373884A1 (en) | Determination of device at which to present audio of telephonic communication | |
US10897599B1 (en) | Facilitation of video conference based on bytes sent and received | |
US20190251961A1 (en) | Transcription of audio communication to identify command to device | |
US10073671B2 (en) | Detecting noise or object interruption in audio video viewing and altering presentation based thereon | |
US20180324703A1 (en) | Systems and methods to place digital assistant in sleep mode for period of time | |
US20200311619A1 (en) | Systems and methods to suggest room swap for meeting | |
US9807499B2 (en) | Systems and methods to identify device with which to participate in communication of audio data | |
US10163455B2 (en) | Detecting pause in audible input to device | |
US11937014B2 (en) | Permitting devices to change settings related to outbound audio/video streamed from another device as part of video conference | |
US11694574B2 (en) | Alteration of accessibility settings of device based on characteristics of users | |
US11057549B2 (en) | Techniques for presenting video stream next to camera | |
US20230005224A1 (en) | Presenting real world view during virtual reality presentation | |
US10645517B1 (en) | Techniques to optimize microphone and speaker array based on presence and location | |
US20230298578A1 (en) | Dynamic threshold for waking up digital assistant | |
US11138862B2 (en) | Systems and methods to electronically indicate whether conference room is in use based on sensor input | |
US10770036B2 (en) | Presentation of content on left and right eye portions of headset | |
US20230319121A1 (en) | Presentation of part of transcript based on detection of device not presenting corresponding audio | |
US20230300250A1 (en) | Selectively providing audio to some but not all virtual conference participants reprsented in a same virtual space | |
US11076112B2 (en) | Systems and methods to present closed captioning using augmented reality | |
US11935538B2 (en) | Headset boom with infrared lamp(s) and/or sensor(s) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: LENOVO (SINGAPORE) PTE. LTD., SINGAPORE Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SEETHALER, KENNETH;CAVENAUGH, ADAM JEROME;FUJII, KAZUO;AND OTHERS;SIGNING DATES FROM 20191220 TO 20191224;REEL/FRAME:051373/0189 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: LENOVO PC INTERNATIONAL LIMITED, HONG KONG Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LENOVO (SINGAPORE) PTE LTD;REEL/FRAME:060638/0160 Effective date: 20210701 |