WO2023113994A1 - Capteur de présence humaine pour dispositifs clients - Google Patents

Capteur de présence humaine pour dispositifs clients Download PDF

Info

Publication number
WO2023113994A1
WO2023113994A1 PCT/US2022/051138 US2022051138W WO2023113994A1 WO 2023113994 A1 WO2023113994 A1 WO 2023113994A1 US 2022051138 W US2022051138 W US 2022051138W WO 2023113994 A1 WO2023113994 A1 WO 2023113994A1
Authority
WO
WIPO (PCT)
Prior art keywords
imagery
computing device
module
human presence
image sensor
Prior art date
Application number
PCT/US2022/051138
Other languages
English (en)
Inventor
Megha MALPANI
Jon Napper
Alan Green
Aneesha GOVIL
Stuart LANGLEY
Ken HOETMER
Christopher IGO
Fei Wu
Jakub MLYNARCZYK
Evan BENN
Edward O'CALLAGHAN
Andrew Mcrae
David LATTIMORE
Dan CALLAGHAN
Eddy Chen
Boris LEE
Tim CALLAHAN
Guoxing Zhao
Rachael MORGAN
Michael MARTIS
Sitar HAREL
Ryosuke Matsumoto
Original Assignee
Google Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/985,275 external-priority patent/US20230196836A1/en
Application filed by Google Llc filed Critical Google Llc
Priority to EP22836369.3A priority Critical patent/EP4405916A1/fr
Priority to CN202280075951.8A priority patent/CN118251707A/zh
Publication of WO2023113994A1 publication Critical patent/WO2023113994A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • client computing devices such as laptops, tablets and netbooks in a variety of settings, including at home, school their office, coffee shops, airports, etc.
  • client computing devices such as laptops, tablets and netbooks
  • a user may want privacy from prying eyes, thus it can be useful to have the screen automatically dim or hide certain private information when others are present.
  • This requires knowing whether a person is in front of their device, whether the person has left their device, and whether another person in the vicinity is looking at the user's screen.
  • this can involve continuously using the client device’s camera and processing the imagery by the computer’s processing system.
  • this can consume significant oper ating system, memory and other processing resources, which is undesirable especially when the device is not coupled to an external power source.
  • it may also be undesirable to have the device’s camera continually capturing imagery, as this may raise privacy concerns.
  • a dedicated, low power, low resolution camera e.g., a monochrome sensor
  • a self- contained processing module that processes the imagery using one or more targeted machine learning (NIL) models.
  • NIL targeted machine learning
  • one or more signals may be sent to the operating system or other component of the client device so that various functions can be performed.
  • the human presence sensor discussed herein lias wide applicability in a variety of different situations to enhance the user experience. For instance, in some situations it can be used to speed up the login process, to avoid dimming the screen when the person is reading a long document, to hide certain information when someone else nearby is also looking at the screen, or to lock the device when the user leaves. Knowing that the image is not stored and not accessible to the mam processor can provide security and peace of mind to the user. Additionally . knowing how presence information is used (or not used) can provide transparency and a sense of security as well.
  • a computing device includes: a processing module including one or more processors: memory coufigmed to store data and instructions associated with an operating system of the computing device; an optional user interface module configured to receive input from a user of the computing derice; an optional display module having a display interface, the display module being configured to present information to the user; and a human presence sensor module
  • the human presence sensor module includes: an image sensor configured to capture imagery within a field of view of the image sensor: local (dedicated) memory configured to store one or more machine learning models, the one or more machine learning models each being trained to identify whether one or more persons are present in the imagery; and local processing such as a dedicated processing module including at least one processing device configured to process the imagery received from the image sensor using the one or more machine learning models to determine whether one or more persons are present in the imagery Imagery captured by the image sensor of the human presence sensor module is not disseminated outside of the human presence sensor module.
  • the human presence sensor module In response to detection that one or more persons are present in the imagery, the human presence sensor module is configured to issue a signal to the processing module of the computing device, such that the processing module responds to the signal by executing one or more instructions associated with the operating system of the computing device.
  • the human presence sensor module further includes a module controller operatively coupled to the image sensor, the dedicated memory and the dedicated processing module.
  • the module controller is configured to receive a notification from the dedicated processing module about the presence of the one or more persons in the imagery, and to issue the signal to the processing module of the computing device.
  • the image sensor may be further configured to: detect motion between sequential images; and to issue a wake on approach signal to the module controller in order to enable the module controller to cause one or more components of the human presence sensor module to wake up from a low power mode.
  • the image sensor is further configured to detect motion between sequential images, and the dedicated processing module is configured to start processing the imagery in response to the detection of motion.
  • the one or more machine learning models may comprise a first machine learning model trained to detect the presence of a single person in the imagery, and a second machine learning model trained to detect the presence of at least two people in the imagery.
  • the machine learning models may further include a model to detect at least a portion of a human face, a model to detect a human torso, a model to detect a human arm, or a model to detect a human hand.
  • the signal to the processing module of the computing device is an interrupt, and the interrupt causes a process of the computing device to wake the computing device from a suspend mode or a standby mode.
  • the signal to the processing module of the computing device is an intemipt
  • the interrupt causes a process of the computing device to initiate face authentication using imagery other than the imagery obtained by the image sensor of the human presence sensor module.
  • the computing device further comprises a display module having a display interface, the display module being communicatively coupled to the processing module and being configured to present information to the user.
  • the signal to the processing module of the computing device is an interrupt, and the interrupt causes a process of the computing device to display information on the display module.
  • a computer-implemented method for a computing device having a human presence sensor module comprises: capturing, by an image sensor of the human presence sensor module, imagery within a field of view of the image sensor, wherein the imagery captured by the image sensor of the human presence sensor module is restricted to the human presence sensor module (and thus not disseminated to another part of the computing device); retrieving from memory of the human presence sensor module, by at least one processing device of the human presence sensor module, one or more machine learning models, the one or more machine learning models each being trained to identify whether one or more persons are present in the imagery; processing by the at least one processing device of the human presence sensor module, the imagery received from the image sensor using the one or more machine learning models to determine whether one or more persons are present in the imagery; and upon de tection that one or more persons are present in the imagery, the human presence sensor module issuing a signal to a processing module of the computing device so that the computing device can respond to that presence by performing one or more actions.
  • the method may further comprise, in response to detection of the presence of the one or more persons, causing the computing device to wake on arrival of a person within the field of view of the image sensor.
  • the method may further comprise, in response to detection of a person leaving the field of view of the image sensor, causing the computing device to lock so that authentication is required to access one or more programs of the computing device.
  • the method may further comprise, in response to detection of a person leaving the field of view of the image sensor, at least one of muting a microphone of the computing device or turning off a camera of the computing device, wherein the camera is not the image sensor of the human presence sensor module.
  • the method may further comprise, in response to detection of the presence of at least two persons in the imagery, performing at least one of issuing a notification to a user of the computing device or blocking one or more notifications from being presented to the user.
  • the method may further include enabling a privacy filter on a display of the computing device.
  • the method may further comprise, in response to detection of the presence of one person in the imagery, performing gesture detection based on additional imagery captured by the image sensor of the human presence sensor module. Alternatively or additionally. in response to detection of the presence of one person in the imagery, the method may further include performing gaze tracking based on additional imagery captured by the image sensor of the human presence sensor module. Alternatively or additionally., in response to detection of the presence of one person in the imagery , the method may further include performing dynamic beamfonning to cancel background noise based on additional imagery captured by the image sensor of the human presence sensor module,
  • the method may further comprise detecting, by the image sensor, motion between sequential image s of the captured imagery, and causing one or more components of the human presence sensor module to wake up from a low power mode in response to detecting the motion.
  • the signal to the processing module of the computing device is an interrupt
  • the interrupt may cause a process of the computing device to initiate face authentication using imagery other than the imagery obtained by the image sensor of the human presence sensor module.
  • FIGs. 1A-D illustrates examples involving human presence sensing in accordance with aspects of the technology.
  • FIG. 2 illustrates a block diagram of an example client device in accordance with aspects of the technology.
  • FIG. 3 illustrates a functional diagram of an example client device in accordance with aspects of the technology.
  • FIG. 4 illustrates an example scenario image evaluation by a human presence sensor module in accordance with aspects of the technology
  • FIGs. 5A-B illustrate an example scenario in accordance with aspects of the technology.
  • FIGs. 6A-B illustrate an example scenario in accordance with aspects of the technology
  • Fig. 6C illustrates a workflow in accordance with this scenario.
  • FIGs. 7A-B illustrate an example scenario in accordance with aspects of the technology.
  • FIGs. 8A-B illustrate an example scenario in accordance with aspects of the technology.
  • FIGs. 9A-B illustrate an example scenario in accordance with aspects of the technology.
  • FIGs. 10A-B illustrate an example scenario in accordance with aspects of the technology.
  • FIGs. 11A-B illustrate an example scenario in accordance with aspects of the technology.
  • Fig. 12 illustrates an example scenario in accordance with aspects of the technology.
  • Fig. 13 illustrates an example user notification in accordance with aspects of the technology.
  • FIGs. 14A-B illustrate a system for use with aspects of the technology.
  • Fig. 15 illustrates a method in accordance with aspects of the technology.
  • a self-contained human presence processing module is able to efficiently detect whether a person is at or near a given client device. This is done using a minimum amount of resources that are segregated from the rest of the processing system of the client device. This allows imagery captured by a dedicated sensor to be evaluated by one or more ML models so that the human presence sensor can signal to the operating system or other part of the client device whether one or more actions are to be performed. Imagery captured by the dedicated sensor need not be saved locally by the processing module, and such imagery is not transmitted from the processing module to another part of the client device. This promotes security and privacy while enabling a rich suite ofUX features to be provided by the client device, using a minimum amount of system resources.
  • Fig. 1A illustrates an example 100 showing a client device 102, such as a laptop computer.
  • display 104 is displaying a screen saver 106 because the client device is not actively being used.
  • the client device includes a keyboard 108, one or more trackpads or mousepads 110. and a microphone 112 as different user inputs.
  • An integrated webcam 114 can be used for videoconferences, interactive gaming, etc.
  • Indicator 116 such as an LED, may be illuminated to alert a user whenever the integrated webcam is in use.
  • the client device also includes a camera or other imaging device 118 that is part of a human presence sensor.
  • the imaging device 118 may be positioned along a top bezel of the client device, In some examples the imaging device may be located in a different position along the client device. For instance, if it is a tablet or other device that is intended to be rotated in different orientations during use, then the imaging device may be positioned along a side of the housing. Here, there may be no indicator associated with the imaging device 118.
  • the human presence sensor evaluates one or more images obtained by the imaging device according to one or more ML models implemented by a processing module of the human presence sensor. Then, as shown in Fig. 1B, upon determination that a person is present, the human presence sensor sends a signal to the operating system of the client device, causing a change from the screen saver to a login screen 122.
  • Fig. 1C illustrates another example 150 in which a person 152 is logged in and using the client device.
  • the person 152 may be using a videoconference program 154 to interact with one or more other people, for instance to discuss a , growth projection on a spreadsheet as shown on the display.
  • the integrated webcam is used to capture video or still imagery for display in the program as shown at 156.
  • what is displayed on the screen maybe sensitive or personal (e.g., a .growth projection of someone’s investment portfolio).
  • Fig. ID illustrates another example 150 in which a person 152 is logged in and using the client device.
  • the person 152 may be using a videoconference program 154 to interact with one or more other people, for instance to discuss a , growth projection on a spreadsheet as shown on the display.
  • the integrated webcam is used to capture video or still imagery for display in the program as shown at 156.
  • what is displayed on the screen maybe sensitive or personal (e.g., a .growth projection of someone’s
  • the human presence sensor when .another person 158 is identified by the human presence sensor as being nearby (and possibly looking toward the screen), the human presence sensor sends a notification to the operating system, the videoconference program or another part of the client device.
  • the operating system or the program may cause the display screen to dim as shown at 160.
  • specific information being displayed such as the spreadsheet, notifications, icons or other objects, may be hidden upon the notification about the other nearby person.
  • the system may give an indication to the user (for example showing an alert in a UI window) that someone is watching. In this way. the human presence sensor can help provide a feeling of security to the user regarding content that was being displayed.
  • Fig. 2 illustrates a block diagram of an example client device 200, such as a laptop computer, tablet PC, netbook, an in-home device such as a smart display, or the like.
  • the client device inrissas a processing module 202 having one or more computer processors such as a central processing unit 204 and/or graphics processors 206, as well as memory module 208 configured to store instructions 210 and data 212.
  • the processors may or may not operate in parallel, and may include ASICs, controllers and other types of hardware circuitry.
  • the processors are configured to receive information from a user through user interface module 214. and to present information to the user on one or more display devices of the display module 216 having a display interface.
  • User interface module 214 may receive commands from a user via user inputs and convert them for submission to a given processor.
  • the user interface module may link to a web browser (not shown).
  • the user inputs may include one or more of a touch screen, keypad, mousepad and/or touchpad, stylus, microphone, or other types of input devices.
  • the display module 216 may comprise appropriate circuitry for driving the display device to present graphical and other information to the user.
  • the graphical information may be generated by the graphics processors) 206, white CPU 204 manages overall operation of the client device 200.
  • the graphical information may display responses to user queries on the display module 216.
  • the processing module may run a browser application or other service using instructions and data stored in memory module 208, and present information associated with the browser application or other service to the user via the display module 216.
  • the memory module may include a database or other storage for browser information, location information, etc.
  • Memory module 208 can be implemented as one or more of a computer-readable medium or media, a volatile memory unit or units, or a non-volatile memray unit or units.
  • the memory module 208 may include, for example, flash memory and/or NVRAM, and may be embodied as a hard-drive or memory card. Alternatively, the memory module 208 may also include removable media (e.g.. DVD, CD-ROM or USB thumb drive).
  • One or more regions of the memory module 208 may be write-capable while other regions may comprise read-only (or otherwise write-protected) memories.
  • a computer program product is tangibly embodied in an information carrier.
  • Fig. 2 functionally illustrates the processor ⁇ ), memory module, and other elements of client device 200 as being within the same overall block, such components may or may not be stored within the same physical housing.
  • some or all of the instructions and data may be stored on an information earner that is a removable storage medium (e.g., optical drive, high-density tape drive or USB drive) and others stored within a read-only computer chip.
  • the data 212 may be retrieved, stored or modified by the processors in accordance with the instructions 210.
  • the data may be stored in computing device registers, in a relational database as a table having a plurality of different fields and records, XML documents or flat files.
  • the data may also be formatted in any computing device-readable format
  • the instructions 210 may be any set of instructions to be executed directly (such as machine code) or indirectly (such as scripts) by the processors).
  • the instructions may be stored as computing device code on the computing device-readable medium.
  • the terms ‘‘instructions” and “programs” may be used interchangeably herein.
  • the instructions may be stored in object code format for direct processing by the processors), or in any other computing device language including scripts or collections of independent source code modules that are interpreted on demand or compiled in advance.
  • the client device 200 includes a communication module 218 for communicating with other devices and systems, including other client devices, servers and databases.
  • the communication module 218 includes a wireless transceiver: alternatively, the module may alternatively or additionally include a wired transceiver.
  • the client device 200 may communicate with other remote devices via the communication module 218 using various configurations and protocols, including short range communication protocols such as near-field communication (NFC), BluetoothTM, BluetoothTM Low Energy (BLE) or other’ ad-hoc networks, the Internet, intranets, virtual private networks, wide area networks, local networks, private networks using communication protocols proprietary to one or more companies, Ethernet, WiFi and HTTP, and combinations of the foregoing, [0039]
  • the example client device 200 as shown includes one or more position and orientation sensors 220.
  • the position and orientation sensors 220 are configured to determine the position and orientation of one or more parts of the client computing device 200.
  • these components may include a GPS receiver to determine the device's latitude, longitude and/or altitude as well as an accelerometer, gyroscope or another direction/speed detection device such as an inertial measmement unit (IMU).
  • Tire client device 200 may also include one or more camera(s) 222 for capturing still images and recording video streams such as the integrated webcam as discussed above, speakers) 224 and a power module 226.
  • Actuators to provide tactile feedback or other information to the user, as well as a security chip such as to prevent tampering with bios or other firmware updates (not shown) may also be incorporated into the client device 200.
  • the client device also includes a human presence sensor module 228.
  • this module includes an image sensor 230.
  • local processing such as a dedicated processing module 232, dedicated (local) memory 234, and a module controller 236.
  • the image sensor is a dedicated low power, low resolution camera, which may provide greyscale or color (e.g.. RGB) imagery that has a size (in pixels) of 320 x 240, 300 x 300 or similar size (e.g., +/- 20%). During operation, imagery may be taken once every 2-10 seconds (or more or less).
  • the dedicated processing module of the local processing may comprise an FPGA or other processing device capable, of processing imagery received from the image sensor in real time using one or more ML models.
  • This memory may be flash memory (e.g., SPI flash memory configured for efficiency with the FPGA).
  • the flash memory may have several megabytes of storage for the models and no more than 1 MB of onboard RAM for performing image processing using the m ode l(s).
  • the imagery may be restricted to the dedicate memory during processing, without dissemination to other parts of the cheat device.
  • Tire human presence sensor module 228 is configured to operate using as little power as possible, for instance on the order of 100 mW or less. Power usage can be minimized in several ways, including putting the local memory into a low power mode whenever possible. Being able to more quickly and accurately dim the screen using the approaches discussed herein can save additional potver.
  • the module 228 may use 5-10 mW in a "Wake on Approach" mode (such as in the example of Figs. 1A-B). This may be achieved by turning off the local processing while its usage is not required.
  • the module 228 may either rely on movement detection incorporated into the image sensor to wake up the local processing or by starting the local processing approximately once every 1-15 seconds to check whether it is needed.
  • imagery' obtained by the image sensor is not stored in the local memory after processing. Regardless of whether any imagery is maintained by the human presence sensor module, it is not transmitted to another part of the client device and would not be used as imagery for a webcam.
  • the module controller may be, for example, a microcontroller or other processing unit configured to manage operation of the human presence sensor and to interface with the processing module 202 or other part of the client device external to the human presence sensor.
  • Fig. 3 illustrates an example 300 of the logical architecture of the client device 200 with the human presence sensor module 228.
  • the module controller 236 is operatively connected to the image sensor 230. local processing by dedicated processing module 232 and dedicated (local) memory 234.
  • the module controller is able to turn the dedicated processing module (local processing) and the local memory on and off, and can update the local memory as needed, such as to add new ML models or update existing models.
  • the module controller couples to the image sensor via an 12C interface 301, while it couples to the local processing (and/or local memory) via an SPI interface.
  • the module controller may be responsible for ensuring only trusted code runs on the human presence sensor module while the client device is in secure mode, for instance by writing the contents of the memory and verifying it.
  • the module controller may also be responsible for managing power states and communicating configuration and status from and to the client device operating system.
  • the module controller may employ a daemon that is responsible for booting the human presence sensor module into a known-good state each time it is powered on. Once it is booted, the daemon can configure functions of the local processing (e.g., person detection, second person detection, etc.).
  • the image sensor is configured to output imagery to the local processing and may send motion detection information, but not imagery, to the module controller.
  • the module may default to a very low power state in which it is just looking for motion.
  • the other components can power up to determine if there is a person in view. If so. the local processing will start doing human presence detection to see if the device should be woken up fully. If not, then the system can go back to low power motion sensing.
  • the local processing may temporarily store data in the local memory when running the one or more ML models on the received imagery.
  • the models may be configured as. e.g..
  • the local processing is configured to send commands and/or data to the module controller.
  • commands sent to the microcontroller can include: (1) Human Detected; (2) No Human Detected; or (3) Second (or additional) Person Detected, etc. These commands can be forwarded to the operating system of the computing device with minimal additional processing, [0046]
  • the module c ontroller is operatively' coupled to an operating system 302 of the client device. For instance, this may be done using an 12C or SPI bus interface, which may pass through a hinge of the client device (such as on a laptop computer).
  • the module controller can issue interrupts 304 or send commands, results or other signals 306 via a bus 307, which may be used by' the operating system.
  • a specific app or program e.g.. a login app or a videoconference program
  • inferrupts can indicate that a person is present or some other condition in the environment detectable by the human presence sensor module.
  • An interrupt can be used to wake the computing device from a suspend or standby mode, e.g., to initiate face authentication or to display information such as notifications or weather.
  • one general mode of operation is for the human presence sensor module to send the results of inferences of one or more models executed by the local processing to the operating system, and to allow one or more processes of the operating system to interpret those results and respond or otherwise proceed accordingly.
  • the operating system may logically include, as shown, a kernel 308, a human presence sensor daemon 310, firmware 312 and one or more routines or other processes 314 such as to control power to the display device or other devices.
  • Tire hitman presence sensor daemon 310 is a software daemon responsible for coordinating communication between the human presence sensor module 228 with the processes 314.
  • the kernel may communicate with the routines or other processes via a system bus 315. such as a d-bus, for inter-process communication.
  • a security component 316 such as a security chip
  • the security chip provides femware write protection to both the operating system and the human presence sensor module, and provides updated and correct firmware for the microcontroller 236 and dedicated processing module 232.
  • the security component 316 may communicate with the human presence sensor module via the bus 307 or other link.
  • the local processing may employ one or more ML models, which are stored in the local memory.
  • the models may include a first model for detecting whether any person is present, and a second model for detecting whether there are any other people in the vicinity as tltis may indicate the need for the operating system or a specific program running on the client device to take a privacy-related action.
  • the presence detection can trigger the system to present a password or other login screen in a "wake on approach" mode.
  • the first model may either trigger the system to dim the display screenfs) to save power and/or to lock access to the client device.
  • this may trigger the system to take an action such as to dim the screen or to blur, deemphasize or otherwise hide selected content from being presented on the screen (e.g., potentially sensitive information, status notifications about email or other messages, etc,).
  • there may be another person that is interacting with the user, such as collaborating on a spreadsheet or sitting in on a videoconference. In these types of situations, dimming the screen or similar actions may not be suitable, although the system may pause or limit the display of personal notifications to the user when the other (authorized) person is present.
  • the models implemented in the human presence sensor module are configured to detect human faces or other parts of a person in images. For example, the head might be mostly above the screen, but the person’s torso, arm or other portion of their body might be visible. Thus, while a cat or other pet may approach the client device, the models are designed so that the system does not react to that presence (e.g.. a pet lock mode). Because one aspect involves a self-contained presence detection system effectively walled off from other parts of the client device (without sending the obtained imagery to those other parts) and another aspect is a goal to keep power usage as low as possible, the image processing is bound by tight constraints. This can include limited memory for storage (e.g..).
  • ROM read-only memory
  • RAM random access memory
  • the models may also factor in one or more of the following: user position with respect to the camera, facial hair and/or different hair styles, facial expressions, whether glasses or accessories are being worn (e.g., earbuds or headphones), variations in lighting, variations in backdrop (e.g., office or classroom setting, indoors versus outdoors, etc.).
  • the model(s) needs to detect when there is a person using the device with another person potentially looking at their screen.
  • these cases may be addressed by separate models, although alternatively a single model may be employed. For instance, a model that detects or counts the number of faces in an image would Ire suitable.
  • the model must reliably detect faces tip to about 2-3 meters away from the camera with approximately a 10-prxel face width.
  • the model(s) would also meet the following requirements.
  • the model size may be constrained to be less than 1MB.
  • Each model may employ, by way of example, a convolutional neural network (CNN), recurrent neural network (RNN), long short-term memory (LSTM) network or combination thereof.
  • CNN convolutional neural network
  • RNN recurrent neural network
  • LSTM long short-term memory
  • the model may be formatted in order to run compactly on a microcontroller with limited memory (e.g,, on the order of a few hundred kilobytes).
  • the model plus any post-processing may be required to ran at > 5 Hz.
  • ML models are able to reduce a large amount of data (e.g., brightness values of thousands of pixels) into a small amount of information (e.g., a probability that the picture contains a cat or a person).
  • data e.g., brightness values of thousands of pixels
  • information e.g., a probability that the picture contains a cat or a person.
  • models perfofm a sequence of operations that produce more and more compact representations of the information contained in the data.
  • they often expand the number of dimensions, making the data less compact and hence use more RAM before reducing them again.
  • a sequence of operations may convert an input 160x160 RGB image represented as 76,800 8-bit integer values to a 40x40x8 tensor represented as 12,800 8-bit integers.
  • the process would expand and reduce the number of channels ("depth ") image twice, (i) first by expanding the image from 3 channels to 16, then reducing it to 8 channels, then (ii) next by expanding the 8 channels to 48, before reducing it to 8 again.
  • Such operations may require a significant amount of memory (e.g., over 350 KB) because they each convert between 80x80x8 and 80x80x48 activation buffers. Ums.
  • a constrained system such as the human presence sensor herein to modify the processing so that certain operations use less memory. This may be done by refactoring those operations into multiple sub-operations, which each work on a portion (aka a "tile") of the buffer.
  • the input data can be split into tiles by rows of input. Such an approach may reduce the memory requirement to below 250 KB.
  • Tire human presence sensor module may be configured to check for presence when the computing device is in a certain configuration or orientation. For instance, this may occur when the device lid for a laptop is open in a clamshell mode, or when a convertible device is in “tent” mode. In contrast, when the lid is closed or the device is in tablet mode, the system may not check for presence.
  • Generating suitable models that can be processed in a memory-constrained manner can be accomplished in different ways. For instance, one could train models with a tiled architecture, ensuring weights are shared appropriately. Or, existing trained models could be post-processed to tile them accordingly.
  • the system may perform a dropout process in which some selected percentage (e.g., 5% - 50%) of output nodes in a layer (of the CNN. for instance) are randomly ignored, as this can help prevent overtraining of the model and can improve generalization to different types of human faces.
  • some selected percentage e.g., 5% - 50%
  • Different data sets may be used to train the model(s).
  • the models may be trained as discussed in “Visual Wake Words Dataset” by Chowdhery et al., published June 12, 2019, which is incorporated herein by reference in its enthe ty.
  • the model naming may include one or more types of image augmentation to help generate robust models. This can include scaling face size (e.g.. to help identify children and adults or identify whether someone is near or far), translate faces, clip faces, synthesize multi-face images, and/or blur out selected face details.
  • scaling face size e.g.. to help identify children and adults or identify whether someone is near or far
  • translate faces e.g.. to help identify children and adults or identify whether someone is near or far
  • clip faces esize multi-face images
  • blur out selected face details e.g. to help identify children and adults or identify whether someone is near or far
  • translate faces e.g. to help identify children and adults or identify whether someone is near or far
  • clip faces esize multi-face images
  • blur out selected face details e.g. to help identify children and adults or identify whether someone is near or far
  • translate faces e.g. to help identify children and adults or identify whether someone is near or far
  • clip faces esize multi-face images
  • padding is requir ed for the image due to the translation, one can either repeat the last row / column. or reflect the image to fill (being careful not to remove/refiect other feces).
  • the training may also involve adding a “synthetic” second person. Here, two images containing feces are chosen. One of the faces is then smoothly blended into the other image. This could incmporate a region that includes part of the body as well. The blend should look as realistic as possible so the ML model cannot leant that images with blended faces are always a second person. Faces may also be blended into images without any feces in them to help with this as well.
  • the Presence0 model may be configured as a binary classifier that outputs a value in [0, 1], where values close to 1 indicate high likelihood of a person in the image.
  • the input imagery from the image sensor may be 320 x 240 grayscale images or other greyscale images of similar size (e.g., 300 x 300). As noted above, color images may also be obtained from the image sensor.
  • Fig. 4 illustrates an example scenario 400 for image evaluation by the human presence sensor module in view of the above.
  • the linage sensor captures an image, which may be a greyscale image.
  • the image may be on the size of. e.g., 320 x 240 or 300 x 300, or a similar size of moderate or low resolution.
  • one or more pre-processing operations may be performed by the image sensor or by the local processing.
  • the image sensor or local processing may make adjustments to the image exposure, gain (analog and'or digital gain) or other image parameters, crop the image, convert from a color (e.g., RGB) image to a greyscale image if the image sensor generates color imagery (e.g., an image sensor having a Bayer filter), etc.
  • Preprocessing may additionally or alternatively include detecting when images are unusable (e.g., all black or fully saturated).
  • Image pre-processing can additionally or alternatively include the image sensor extracting motion information from the image, e.g.. based on a comparison of the image pixels to those of one or more images taken immediately prior to the current image.
  • the extracted motion information would be sent to the module controller as indicated by the dotted downward arrow 405 from the image pre-processing block 404.
  • image sensor parameters such as to account for different fighting conditions, may be calibrated at initial setup (e.g.. each time the human presence sensor module is turned on or each time the image sensor is initialized), and may also be adjusted in between image captures. For instance, during a first image capture there may be no one in the room and the lights are off.
  • the image capturing process itself may occur continuously every X milliseconds, such as every 100-500 milliseconds (or more or less), so long as the human presence sensor module is operating.
  • operation of the module may involve the user of the client, device affirmatively granting permission.
  • the local processing applies the one or more ML models to the raw or pre- processed image.
  • the models are maintained in local memory and during processing data is temporarily stored in, e.g., RAM, of the local memory, thereby ensuring that all processing of the imagery is segregated from the other components of the client device.
  • Another model may be trained to detect people wearing masks, or who’s faces are otherwise partly obscured (such as when a person is not directly facing the image sensor, In one scenario, at least 30% of the face may need to be visible at the edge of the image to detect the presence of a person.
  • output from the applied models may be an indication that one or more people are present.
  • an interrupt, commands, result or other signal may be issued by the local processing and/or the module controller so that the operating system or other part of the client device may perform one or more actions in response to the presence detection.
  • the human presence detection information generated by the firewalled module can be used in a wide variety of applications and scenarios, as discussed in detail below.
  • FIGs. 1A-B One scenario, illustrated in Figs. 1A-B. involves “wake on arrival”, where the client device wakes up and displays a password or other lock screen when user presence is detected.
  • the human presence sensor module e.g., 228 in Fig. 2
  • the human presence sensor module would constantly be running when the client device is asleep.
  • the user sits (town at their client device (e.g., a laptop) to start work for the day.
  • the human presence sensor module detects the arrival of the user and automatically wakes up the client device and causes a lock screen to be displayed for the user to sign in without the need for the person to touch the client device, as shown in Fig. 5B.
  • FIG. 6A Another scenario involves “lock on leave”, which involves locking the client device when human presence is no longer detected.
  • the user may step away from their client device to take a call.
  • the system causes the client device to lock for security and can dim or turn off the screen to conserve power as shown in Fig. 6B (or display a screen saver).
  • How long the system waits to lock the computer and dim/tum off the display may vary, for instance based on system power saving settings and/or user preferences.
  • the screen may dim immediately after no presence is detected or some limited time (e.g.. 1-30 seconds, or longer).
  • the computer may then lock after dimming or some other amount of time (e.g., after 1-5 minutes, or more or less). Upon lockout the screen may turn off (or display a screen saver).
  • Fig. 6C illustrates an example workflow 600 for a quick dim process (e.g., due to the absence of a user) as part of a lock on leave evaluation.
  • the present state of the screen may be stored in a state controller that may be part of module controller 236 of the human presence sensor 228 of Fig. 2.
  • the present state may be updated by an HPS service signal according to results generated by dedicated processing module 232.
  • a dimming process 606 or an undimming process 608 will start. Assuming the current state evaluated at block 604 is that the screen is not dimmed. the process proceeds to block 606. Here, an evaluation is made at block 610 as to whether a duration since a last user action is greater than a first threshold. This threshold may correspond to a time in which screen dimming is imminent. If the duration does not exceed the first threshold, then the process proceeds to block 612. Here, if the duration since the last user action exceeds a second threshold, then the process proceeds to block 614 where dimming commences.
  • the process also proceeds to block 614 so that the dimming can commence.
  • This dimming can be due to inactivity, and may involve the screen gradually dimming over several seconds or more, or immediately dimming or completely turning off. If the duration does not exceed the second threshold, then the process will timeout at block 616. The system can then subsequently re-evaluate the present state starting at block 602.
  • the undimming process within block 608 61 volves first evaluating whether there has been any recent user activity at block 618. If user activity has been detected, then at block 620 the screen is undimmed. However, if no user activity has been detected, then at block 622 the system evaluates whether the duration since the last detected activity is less than a third threshold. By way of example, this threshold may be on the order of se veral minutes or longer, e.g., at least 3-6 minutes. Here, if the duration is less, then the screen may be undimmed at block 624 by the HPS module. If the duration is greater, then the process times out at block 616 and the evaluation can begin again at present state block 602.
  • the system may implement quick dimming with quick locking. For instance, if a user has stopped typing, a first timer may begin (for screen locking). Here, should the presence sensor detect that the user has moved away from the compiling device, a second timer (for quick dimming) may also begin. Then after the quick dimming timer exceeds its threshold (e.g., 5-30 seconds), then the screen would dim because of the user’s absence.
  • a first timer may begin (for screen locking).
  • a second timer for quick dimming
  • its threshold e.g., 5-30 seconds
  • the screen would be locked.
  • the presence sensor detects that the user has returned.
  • the screen would undim (e.g., according to block 624).
  • An alternative or complementary option to the quick dim process is delayed dimming based on user presence. For instance, if the duration since the last detected user presence is greater than the threshold for a quick dim (e.g., on the order of 5-20 seconds), then a quick dim process can occur. However, if the user is present all the time, but there has been no relevant activity (e.g., the user is not interacting with the ccsmputing device), then eventually a standard dim process can happen.
  • the threshold for such a process maybe on the order of 10-20 minutes, or more or less.
  • “Mute on leave” is yet another scenario. For instance, during a video call or gaming session, when no presence detected the (lack of) presence signal would came the operating system, app or other element of the client device to mute the microphone and turn off the webcam (while the image sensor of the presence module remains active).
  • Fig. 7A illustrates where the user steps away fem the client device and
  • Fig. 7B illustrates muting the microphone and aiming off the webcam while a videoconferencing app is still active.
  • the microphcsne may be muted and the webcam tinned off immediately after no presence is detected or after some limited time (e.g,, 1-30 seconds). This functionality may occur before or otherwise in conjunction with a lock on leave action.
  • Figs. 8A-B illustrate a “shoulder-surf" situation, in which the system notifies the user of the client device when another person is detected.
  • this may involve an unexpected face directed at the display screen.
  • the human presence sensor module detects that there are two faces directed at the screen.
  • reccsgnition of the faces themselves would not be performed.
  • an interrupt or other signal may be passed to the device’s operating system in order to authenticate the user, e.g., according to a facial recognition process.
  • the user may be working on confidential information when a curious onlooker takes an interest in the work.
  • the presence module Based on the second face detection signal fem the presence module, as illustrated in Fig.
  • a notification is displayed in this example that tells the user their work may be being viewed (e.g.. “Looks like someone is interested in your work.”).
  • the system may notify the user by inserting an icon into the status bar, which symbolizes somecme else is looking at the screen.
  • a ripple or other animation may be shown around the shelf icon or other part of the GUI to catch the attention of the user.
  • the GUI may raise the shelf for 3 seconds (or more or less) as a notification about the other person appears.
  • the GUI may show a backgromid notification explaining the detection and presenting a button to trigger a dim screen action and/or and a button that will take user to settings for human presence sensing.
  • the system may dim the screen or block certain notifications or other information when others are looking at the screen. For instance, the contents of email messages, instant messages, calendar events, chat boxes, video or still images, audio and/or other information may be hidden or otherwise masked. This masking may be accompanied by a corresponding notification.
  • the user may be given the option of approving or rejecting the masking,
  • the baseline action taken when a second person is detected can be minimal, e.g., just an icon notifying the user in conjunction with masking the user's private notifications.
  • the system may provide different options that the user can choose between, such as 1) just geting a small icon notification in the setings bar, 2) masking all app notifications (not including system ones a low battery warning or application crash notification), and 3) dimming the screen, which may be considered the most invasive intervention of these three options.
  • Figs. 9A-B illustrate another scenario involving an auto-enable privacy filter.
  • the privacy filter may be applied either when a camera is detected or multiple people are detected. For instance, as shown in Fig. 9A, a passerby may decide to take a photo of the coffee shop that the user is working in. Here, upon detection and the con’esponding signal, the system may automatically apply a privacy filter on some or all of the screen.
  • the privacy filter may include any visual changes that would make it harder to read the screen if the person is not directly in front of it.
  • a privacy filter 902 is shown applied to the left half of the display. In this situation, an ML model could be employed to identify different types of cameras or mobile devices that likely have cameras (e.g., mobile phones, tablets, head-mounted wearable computers, etc.)
  • Figs. 10A-B illustrate a situation involving a “next slide gesture” .
  • the user can wave their hand in a particular direction (e.g., left to right or right to left) in order to have a presentation proceed to a next slide or go back a slide, scroll through a document. webpage, window, tab, workspace or e-book, etc.
  • Another gesture use case involves control for audio content, as shown in Fig. 10B. For instance, when music is playing via an app, the user can use gestures to start and pause a song, skip to the next song, or restart a song.
  • gestures can be used to fast forward or rewind such as skipping forward or backward by, e.g., 10, 20 or 30 seconds at a time, move to the next segment, etc. Gesturing can also be used to turn the volume up or down.
  • gestures can be used for video control.
  • the user can gesture to start or pause a video, rewind or fast forward, change the volume and/or adjust the video in some way (e.g.. change the brightness level or the aspect ratio).
  • the user may gesture to control the microphone to mute it or unmute it, which can avoid the necessity of using a mouse, touch screen or physical key on the client device.
  • gesturing can be employed to cause the app to start or stop transcribing what the person says.
  • the system can be used to turn sign language into subtitles or text captions in an app or other program, or to otherwise use sign language as input to control operation of an app or a component of the computing system.
  • a gesture such as raising a hand can be used to notify other participants that the user has something to contribute (even when they cannot see each other).
  • the user may gesture to swipe away a notification presented on the client device, or even open or otherwise act on the notification.
  • Pinch and zoom gestures can be used to zoom in or out for displayed content and/or to maximize/minimize windows.
  • Gestures may also be supported for swipes between virtual desks/tabs in the UI.
  • the system may support customizable gesnnes that enable users to set certain gestures to trigger existing shortcuts, e.g., those most common to their specific needs.
  • the image capture rate of the image sensor of the human presence sensor module may be adjusted.
  • other on-device sensors e.g., an RF gesture detector, an acoustic sensor or the webcam or other camera of the client device
  • the system may support “Prime face authentication”.
  • Primary face authentication when user presence is detected and authentication is enabled, if the device is asleep, when user presence is detected and face authentication is enabled, the presence sensor module may send an interrupt or other signal so that a process associated with the operating system can begin attempting to authenticate the user, such as via facial recognition. Pairing presence sensing and recognition in a wake on approach process can result in no-touch login by the user;
  • a further aspect involves gaze tracking, in which the system knows where a user is looking (e.g., at a webcam, somewhere on the display screen, off screen, right/left of screen, behind the screen or looking past the client device enthe ly, etc.).
  • this can support a suite of assistive reading feature, such as increasing the font size of text while reading, surface predictive definitions for words the user spends a long time looking at (e.g., when the user’s gaze lingers on a word or phrase for at least X seconds, such as 3-5 seconds or more), providing a visual cue such as a line reader or highlighting that moves with the user’s eyes to help them focus, automatic scrolling when reading a long document, and/or automatically masking or otherwise deemphasizing a finished part of a document or other material to reduce distraction.
  • gaze tracking can be used to present selected content on the display that the user is currently viewing.
  • the system can detect where the user’s attention is focused, which may or may not be towards a display. Here, the system may blur the display when the user is looking away to protect privacy. Detecting when the user is not paying attention to one or more screens enables the system to throttling the frame rate to the display modules of those screens to preserve power.
  • the system may throttle content which is deemed "uninteresting" because the user has not looked at it for a certain period of time (e.g., at least 15-20 seconds or more), e.g., by dropping animation framerate, restricting CPU dock frequencies, using only certain processing cores, etc.
  • the system may nudge the user to focus if it detects that the user ’s attention is divided (e.g.. the user keeps glancing at their mobile phone instead of looking at the display(s) of the computing system).
  • the system can nudge the user to focus if it detects that the user’s attention has been diverted (e.g,, if they were looking at the screen while using an app, but have glanced away for more than 15-30 seconds while still seated in front of the client device).
  • the system can estimate the strength of the attention in order to deliver important or prioritized messages when the user's attention is determined to exceed a threshold (e.g,, it is estimated with 90% confidence that the user is focused on the display, so present a notification at that time about an urgent message).
  • the attention can be used to support apps with particular use cases, such as taking a photo for a driver’s license application or to use as an avatar for an app.
  • gaze detection can be a useful input feature for certain features, such as palm rejection (e.g., when the user’s palm inadvertently rests on a trackpad of the client device), smart dimming, touchpad autocorrection, etc.
  • combinations of gesturing and gaze detection can enhance system operation.
  • the system can have a mode that uses both gaze tracking and a gesture to control the computing device.
  • Fig. 11A illustrates a situation where the system is able to detect the user’s hand pose.
  • the system may track specific parts of the hand as shown by points 1100. Tins may be employed, by way of example, in pointing extrapolation as shown in Fig. 11B.
  • the presence sensor can detect their hand/fmger (or pen, stylus, etc.) and interpolate where the user is pointing on the display. Based on this information, the OS or app can then highlight or illustrate (e.g., via. a “laser-point” line with a. dot) the object on the display being pointed at.
  • this can provide a virtual pointer when the user is presenting, or when the user is commenting on a slide, doc or other material during an interaction with other (remote) participants.
  • gaze detection may be employed to move a pointer on the screen to whatever display the user is looking at,
  • the presence detector is configured to identify whether a person is there, In one example this can include identifying cats, dogs or other household pets (or even children), for instance using one or more specific ML models. Upon this type of detection, the system may cause keyboard or mouse/trackpad inputs to be disabled. However, other functionality such as playing an audio book or showing a video-movie on the cheat device may continue to be enabled.
  • Fig. 12 An example of dynamic beamforming is shown in Fig. 12.
  • beanitorming allows for background noise to be cancelled out when on a call by focusing an area of microphone input to a specific location.
  • the client device can identify when someone moves in its vicinity and dynamically update where the beam is directed so they will not have disruption in their talking. For instance, the human presence sensor module would determine the angle and distance to the user. This can involve detecting face location and face size in the image.
  • having one or more additional image sensors can be used to provide a stereo image for more robust pose determination of the user relative to the client device.
  • An array or other set of microphones can use this positional information to perform spatial filtering, such as to suppress unwanted background noises.
  • Another scenario involves presenting notifications to others in active apps. For instance, on calls (e.g.. audio calls, or video-muted calls), if a person steps away from the client device based on presence detection, that information may be used to digger a response in the app, such as an indication to the video call service so participants in a large meeting can know not to ask the person questions.
  • calls e.g.. audio calls, or video-muted calls
  • This can be particularly usefill in enterprise or educational settings, especially if teachers or professors want to know their students are present in low-bandwidth settings where video may be turned off.
  • This feature may be enabled as a user privacy selection in the operating system or a feature in the app itself, such as when the user joins a videoconference.
  • Tire presence information may be employed to turn the user interface (including a screen saver) into a useful “surface", such as by providing health and wellness suggestions.
  • a useful “surface” such as by providing health and wellness suggestions.
  • one aspect is to detect a person in the room and then tarn the screen into a useful screen saver.
  • Another aspect is to support eye strain and wellness features upon detection that a person has been at their computer for a long time.
  • the user interface may present a reminder for the user to focus their eyes away from display at timed intervals, blink a few times, close their eyes or perform other actions to rest their eyes.
  • the system may dim the screen when the user is resting their eyes, or refrain from dimming the screen so long as the person is present in front of the de vice and is engaged with it.
  • a reminder may be provided for the user to stand up and stretch or walk away from the computer for a minute or two. Other reminders could involve posture information (“don’t hunch your shoulders") or something else to cause a brief break in the routine (“Smile!).
  • Another scenario involves “3D windows”, in which the user interface can adapt to positional (e.g.. X/Y/Z) coordinates based on where/how the user is situated relative to the client device. Such information may be passed through to games for vision orientation.
  • positional e.g.. X/Y/Z
  • sensors of the client device e.g., close range radar sensor, acoustical sensors, webcam, etc.
  • presence detection information is used to trigger bandwidth management.
  • the system can automatically reduce quality while the user is away and switch back to a default quality when one or more users are present.
  • the video or streaming service may be paused while the user's presence is not detected.
  • Display brightness can rapidly degrade batery life.
  • the display can be dimmed to a minimum level and restored to the previous state once the user approaches. This could also be applied to other services running in the background that could impact batery life.
  • the system can use gaze tacking to save battery life by selectively dimming certain display areas.
  • gaze tracking can be employed to dirn areas of the display screen(s) peripheral to the gaze direction.
  • Another beneficial scenario for presence detection involves dynamic volume control.
  • the volume during a call or while on a game could increase or decrease depending on how far the user steps away from the client device.
  • Distance estimation may be performed by the local processing, with or without supplemental information from other onboard sensors (e.g., acoustic or close-in radar sensors or imagery from a webcam to help provide a depth of field.
  • onboard sensors e.g., acoustic or close-in radar sensors or imagery from a webcam to help provide a depth of field.
  • the size of the person may affect the distance estimation, so information from prior detections, such as when the user is sitting in front of the device, can be employed to estimate how far they have moved from it.
  • the presence detec tion can be used to let a logged in user know if anyone attempted to touch their computer while they were away from it.
  • the system may take a picture or video whenever someone approaches the computer, temporarily store it in local memory, and then use it to notify the authorized user.
  • imagery may be shown on the display screen.
  • the imagery may be stared in an encrypted format.
  • the imagery may be transmitted (e.g., via email) to the user or the user may be notified via a text message, phone call, chat or other instant message. In the situation where the imagery is sent off-device, this may only occur upon authorization of the user, with or without encryption of the transmitted imagery.
  • Presence sensing can be very beneficial for accessibility (e.g., “ally”) features. For instance, when a user is detected but no interaction lias taken place, especially when the lock screen is presented or the machine is first out of the box. the presence information may trigger the system to enable various al ly features to see if they unblock the user.
  • the UI may display and'or provide audio stating "We noticed you are hying to set up the computer, do you want to turn on voice control?" .
  • the system could enable, voice control features to aid users with motor impairments to completely control their device with voice. While sometimes it can be a challenge to always have the computer listening in that the user may have to toggle the feature off if they want to talk to someone else in the room. But using the presence sensor technology. the operating system or specific apps can stop listening to commands whenever the user turns away from the client device.
  • Presence sensing information can provide hints to let users know if they're centered within the image frame or not, if they are lacing front or to the side, have their head tilted, etc. Audible, visual and'or haptic feedback can guide the person to properly align themselves in the frame.
  • the presence detection information can be used by the system to select (or not select) certain authentication or verification inputs. By way of example, the system may not show a captcha if no one is present.
  • the presence detection technology may require user authorization before presence detection is enabled. This may include providing information about the technology, including how imagery may be used or stored, and enabling it upon receipt of authorization.
  • Fig, 13 illustrates one example of information that may be presented to the user prior to enabling presence detection.
  • there may be no indicator associated with the imaging device (which would otherwise always be on as the presence sensor operates).
  • an icon or other indicator may be provided in a system tray, in a popup window, on the UI desktop, etc., to show the status of the presence sensing technology.
  • the user may elect to turn off the presence sensing technology for a particular timeframe (e.g,, 5-10 minutes, an hour, all day), when using a particular app or other program (e.g., when preparing a book report or term paper), or upon a particular condition or situation.
  • a particular timeframe e.g, 5-10 minutes, an hour, all day
  • a particular app or other program e.g., when preparing a book report or term paper
  • a user may be provided with controls allowing the user to make an election as to both if and when systems, programs, or features described herein may enable collection of user information (e.g., imagery), and if the user is sent content or communications from a server.
  • user information e.g., imagery
  • certain data may be treated in one or more ways before it is stored or used, so that personally identifiable information is removed.
  • a user ’s identity may be treated so that no personally identifiable information can be determined for the user.
  • the user may have control over what information is collected about the user, how that information is used, and what information is provided to the user.
  • FIGs. 14A and 14B are pictorial and functional diagrams, respectively, of an example system 1400 that includes a plurality of computing devices and databases connected via a network.
  • computing device(s) 1402 may be a cloud-based server system that provides or otherwise supports one or more apps, games or other programs.
  • Database 1404 may store app/game data, user profile information, or other information.
  • the server system may access the databases via network 1406.
  • Client devices may include one or more of a desktop computer 1408. a lap top or tablet PC 1410 and in- home devices such as smart display 1412. Other client devices may include a personal communication device such as a mobile phone or PDA 1414 or a wearable device 1416 such as a smartwatch, etc. Another example client device is a large screen display or interactive whiteboard 1418. such as might be used in a classroom, conference room, auditorium or other collaborative gathering space where multiple users may be present.
  • computing device 1402 may include one or more server computing devices having a plurality of computing devices, e.g., a load balanced server farm or cloud computing system, that exchange information with different nodes of a network for the purpose of receiving, processing and transmitting the data to and from other computing devices.
  • computing device 1402 may include one or more server computing devices that are capable of communicating with any of the computing devices 1408-1418 via the network 1406. This may be done as part of hosting one or more collaborative apps (e.g., a videoconferencing program, an interactive spreadsheet app or a multiplayer game) or services (e.g.. a movie streaming service or interactive game show where viewers can provide comments or other feedback).
  • collaborative apps e.g., a videoconferencing program, an interactive spreadsheet app or a multiplayer game
  • services e.g.. a movie streaming service or interactive game show where viewers can provide comments or other feedback.
  • each of the confuting devices 1402 and 4108-1418 may include one or more processors, memory, data and instructions.
  • the memory stores information accessible by the one or more processors, including instructions and data that may be executed or otherwise used by the processor(s).
  • the memory may be of any type capable of storing information accessible by the processors), including a computing device-readable medium.
  • the memory is a non-transitory medium such as a hard-drive, memory card, optical disk, solid-state, etc. Systems may include different combinations of the foregoing, whereby different portions of the instructions and data are stored on different types of media.
  • the instructions may be any set of instructions to be executed directly (such as machine code) or indirectly (such as scripts) by the processor(s).
  • the instructions may be stored as computing device code on the computing device-readable medium.
  • the terms “instructions'; “modules” and “programs” maybe used interchangeably herein.
  • the instructions may be stored in object code format for direct processing by the processor, or in any other computing device language including scripts or collections of independent source code modules that are interpreted on demand or compiled in advance.
  • the processors may be any conventional processors, such as commercially available CPUs.
  • each processor may be a dedicated device such as an ASIC, graphics processing unit (GPU), tensor processing unit (TPU) or other hardware-based processor.
  • FIG. 4B functionally illustrates the processors, memory, and other’ elements of a given computing device as being within the same block, such devices may actually include multiple processors, computing devices, or memories that may or may not be stored within the same physical housing.
  • the memory may be a hard drive or other storage media located in a housing different from that of the processor(s), for instance in a cloud computing system of server 1402. Accordingly, references to a processor or computing device will be understood to include references to a collection of processors or computing devices or memories that may or may not operate in parallel.
  • the computing devices may include all of the components normally used in connection with a computing device such as the processor and memory described above as well as a user interface subsystem for receiving input from a user and presenting information to the user (e,g.. text, imagery and/or oilier graphical elements).
  • the user interface subsystem may include one or more user inputs (e.g., at least one front (user) facing camera, a mouse, keyboard, touch screen andfer microphone) and one or more display devices that is operable to display information (e.g., text, imagery and/or other graphical elements). Other output devices, such as speaker(s) may also provide information to users.
  • each client device e.g., any or all of 1408- 1418
  • the user-related computing devices may communicate with a back-end computing system (e.g.. server 1402) via one or more networks, such as network 1406.
  • the user-related computing devices may also communicate with one another without also cominuniating with a back- end computing system.
  • the network 1406, and intervening nodes may include various configurations and protocols including short range communication protocols such as BluetoothTM, Bluetooth LETM, the Internet, World Wide Web, intranets, virtual private networks, wide area networks, local networks, private networks using commitnication protocols proprietary to one or more companies, Ethernet, WiFi and HTTP, and various combinations of the foregoing.
  • Such communication maybe facilitated by any device capable of transmitting data to and from other computing devices, such as modems and wireless interfaces.
  • Fig. 15 illustrates a method 1500 for a computing device having a human presence sensor module in accordance with aspects of the technology.
  • the method includes capturing, by an image sensor of the human presence sensor module, imagery within a field of view of the image sensor.
  • the imagery captured by the image sensor of the human presence sensor module is restricted to the human presence sensor module (e g., for temporary storage during processing), and is not disseminated outside of the human presence sensor module to another part of the computing device.
  • the method includes retrieving from memory of the human presence sensor module. by at least one processing device of the human presence sensor module, one or more machine learning models.
  • Tire one or more machine learning models are each trained to identify whether one or more persons are present in the imagery.
  • the method includes processing, by the at least one processing device of the human presence sensor module, the imagery received from the image sensor using the one or more machine learning models to determine whether one or more persons are present in the imagery.
  • the method includes, upon detection that one or more persons are present in the imagery , the human presence sensor module issuing a signal to an operating system of the computing device so that the computing device can respond to that presence by performing one or more actions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne une technologie fournissant un dispositif informatique (200) comportant un module capteur de présence humaine (228). Un capteur d'images (230) du module de détection de la présence humaine capture des images, qui ne sont pas diffusées en dehors du module de détection de la présence humaine vers une autre partie du dispositif informatique (1502). Un ou plusieurs modèles d'apprentissage automatique, chacun entraîné à identifier si une ou plusieurs personnes sont présentes dans les images, sont extraits de la mémoire (234) du module de capteur de présence humaine (1502). Les images reçues du capteur d'images sont traitées à l'aide d'un ou plusieurs modèles d'apprentissage automatique afin de déterminer si une ou plusieurs personnes sont présentes dans les images (1506). Lorsqu'il détecte la présence d'une ou de plusieurs personnes dans l'image, le module de détection de la présence humaine envoie un signal au système d'exploitation du dispositif informatique afin que ce dernier puisse réagir à cette présence en effectuant une ou plusieurs actions (1508).
PCT/US2022/051138 2021-12-17 2022-11-29 Capteur de présence humaine pour dispositifs clients WO2023113994A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22836369.3A EP4405916A1 (fr) 2021-12-17 2022-11-29 Capteur de présence humaine pour dispositifs clients
CN202280075951.8A CN118251707A (zh) 2021-12-17 2022-11-29 用于客户端设备的人类存在传感器

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202163290768P 2021-12-17 2021-12-17
US63/290,768 2021-12-17
US17/985,275 2022-11-11
US17/985,275 US20230196836A1 (en) 2021-12-17 2022-11-11 Human Presence Sensor for Client Devices

Publications (1)

Publication Number Publication Date
WO2023113994A1 true WO2023113994A1 (fr) 2023-06-22

Family

ID=84820348

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/051138 WO2023113994A1 (fr) 2021-12-17 2022-11-29 Capteur de présence humaine pour dispositifs clients

Country Status (2)

Country Link
EP (1) EP4405916A1 (fr)
WO (1) WO2023113994A1 (fr)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210382542A1 (en) * 2019-03-13 2021-12-09 Huawei Technologies Co., Ltd. Screen wakeup method and apparatus

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210382542A1 (en) * 2019-03-13 2021-12-09 Huawei Technologies Co., Ltd. Screen wakeup method and apparatus

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
AAKANKSHA CHOWDHERY ET AL: "Visual Wake Words Dataset", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 12 June 2019 (2019-06-12), XP081381226 *
SEDIGH GHAMARI ET AL: "Quantization-Guided Training for Compact TinyML Models", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 10 March 2021 (2021-03-10), XP081909226 *

Also Published As

Publication number Publication date
EP4405916A1 (fr) 2024-07-31

Similar Documents

Publication Publication Date Title
US11726324B2 (en) Display system
US11727093B2 (en) Setting and terminating restricted mode operation on electronic devices
US11671697B2 (en) User interfaces for wide angle video conference
US20220103758A1 (en) User interfaces for media capture and management
Kane et al. Bonfire: a nomadic system for hybrid laptop-tabletop interaction
US10555116B2 (en) Content display controls based on environmental factors
US11663309B2 (en) Digital identification credential user interfaces
US9094539B1 (en) Dynamic device adjustments based on determined user sleep state
US8957847B1 (en) Low distraction interfaces
US9317113B1 (en) Gaze assisted object recognition
US8549418B2 (en) Projected display to enhance computer device use
US20230262317A1 (en) User interfaces for wide angle video conference
US20140354531A1 (en) Graphical user interface
US11893214B2 (en) Real-time communication user interface
EP2394235A2 (fr) Système basé sur la vidéo et contribuant à la confidentialité
US20230319413A1 (en) User interfaces for camera sharing
CN112214112A (zh) 参数调节方法及装置
US20230196836A1 (en) Human Presence Sensor for Client Devices
CN111596760A (zh) 操作控制方法、装置、电子设备及可读存储介质
CN116210217A (zh) 用于视频会议的方法和装置
US9645789B1 (en) Secure messaging
CN114556270A (zh) 放大用户界面的眼睛注视控制
US11907357B2 (en) Electronic devices and corresponding methods for automatically performing login operations in multi-person content presentation environments
US20230254448A1 (en) Camera-less representation of users during communication sessions
WO2023113994A1 (fr) Capteur de présence humaine pour dispositifs clients

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22836369

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022836369

Country of ref document: EP

Effective date: 20240426

WWE Wipo information: entry into national phase

Ref document number: 202280075951.8

Country of ref document: CN

Ref document number: 202447038251

Country of ref document: IN

NENP Non-entry into the national phase

Ref country code: DE