WO2021061551A1 - Method and device for processing camera images - Google Patents

Method and device for processing camera images Download PDF

Info

Publication number
WO2021061551A1
WO2021061551A1 PCT/US2020/051746 US2020051746W WO2021061551A1 WO 2021061551 A1 WO2021061551 A1 WO 2021061551A1 US 2020051746 W US2020051746 W US 2020051746W WO 2021061551 A1 WO2021061551 A1 WO 2021061551A1
Authority
WO
WIPO (PCT)
Prior art keywords
substitute
image
person
implementations
biometric measurements
Prior art date
Application number
PCT/US2020/051746
Other languages
French (fr)
Original Assignee
Qsinx Management Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qsinx Management Llc filed Critical Qsinx Management Llc
Publication of WO2021061551A1 publication Critical patent/WO2021061551A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0861Network architectures or network communication protocols for network security for authentication of entities using biometrical features, e.g. fingerprint, retina-scan
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/131Protocols for games, networked simulations or virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Definitions

  • the present disclosure relates to techniques for processing and filtering camera images.
  • Implementations of computer-generated experiences involve the use of image sensors such as cameras. These sensors may capture images of a user, for example for purposes of determining the user’s movements and translating those movements into an immersive experience. In doing so, these sensors may also incidentally capture images that may be perceived as unnecessary.
  • Figure 1 is a block diagram of an example operating architecture in accordance with some implementations.
  • Figure 2 is a block diagram of an example controller in accordance with some implementations .
  • Figure 3 is a block diagram of an example electronic device in accordance with some implementations.
  • Figure 4A illustrates a first setting based on a scene camera of a device.
  • Figure 4B illustrates a second, processed, setting based on a scene camera of a device.
  • Figure 5 is a flowchart representation of a processing technique in accordance with some implementations.
  • a method is performed at a device including one or more processors, non-transitory memory, and an image sensor.
  • the method includes capturing, using the image sensor, a captured image of a person.
  • the method includes detecting, in the captured image, a portion of the person, wherein the portion is associated with a first plurality of measurements.
  • the method includes generating, from the captured image, a modified image by replacing the portion with a substitute portion associated with a second plurality of measurements different than the first plurality of measurements.
  • the method includes storing, in the non-transitory memory, the modified image.
  • an electronic device comprises one or more processors working with non-transitory memory.
  • the non-transitory memory stores one or more programs of executable instructions that are executed by the one or more processors.
  • the executable instructions carry out the techniques and processes described herein.
  • a computer (readable) storage medium has instructions that, when executed by one or more processors of an electronic device, cause the electronic device to perform, or cause performance, of any of the techniques and processes described herein.
  • the computer (readable) storage medium is non-transitory.
  • a device includes one or more processors, a non-transitory memory, and means for performing or causing performance of the techniques and processes described herein.
  • Physical settings are those in the world where people can sense and/or interact without use of electronic systems.
  • a room is a physical setting that includes physical elements, such as, physical chairs, physical desks, physical lamps, and so forth. A person can sense and interact with these physical elements of the physical setting through direct touch, taste, sight, smell, and hearing.
  • an extended reality (XR) setting refers to a computer-produced environment that is partially or entirely generated using computer- produced content. While a person can interact with the XR setting using various electronic systems, this interaction utilizes various electronic sensors to monitor the person’s actions, and translates those actions into corresponding actions in the XR setting. For example, if an XR system detects that a person is looking upward, the XR system may change its graphics and audio output to present XR content in a manner consistent with the upward movement. XR settings may incorporate laws of physics to mimic physical settings.
  • XR Concepts of XR include virtual reality (VR) and augmented reality (AR).
  • VR virtual reality
  • AR augmented reality
  • Concepts of XR also include mixed reality (MR), which is sometimes used to refer to the spectrum of realities between physical settings (but not including physical settings) at one end and VR at the other end.
  • Concepts of XR also include augmented virtuality (AV), in which a virtual or computer-produced setting integrates sensory inputs from a physical setting. These inputs may represent characteristics of a physical setting. For example, a virtual object may be displayed in a color captured, using an image sensor, from the physical setting. As another example, an AV setting may adopt current weather conditions of the physical setting.
  • MR mixed reality
  • AV augmented virtuality
  • Some electronic systems for implementing XR operate with an opaque display and one or more imaging sensors for capturing video and/or images of a physical setting.
  • a system captures images of a physical setting, and displays a representation of the physical setting on an opaque display using the captured images, the displayed images are called a video pass-through.
  • Some electronic systems for implementing XR operate with an optical see-through display that may be transparent or semi-transparent (and optionally with one or more imaging sensors). Such a display allows a person to view a physical setting directly through the display, and allows for virtual content to be added to the person’s field-of-view by superimposing the content over an optical pass-through of the physical setting.
  • Some electronic systems for implementing XR operate with a projection system that projects virtual objects onto a physical setting.
  • the projector may present a holograph onto a physical setting, or may project imagery onto a physical surface, or may project onto the eyes (e.g., retina) of a person, for example.
  • Electronic systems providing XR settings can have various form factors.
  • a smartphone or a tablet computer may incorporate imaging and display components to present an XR setting.
  • a head-mountable system may include imaging and display components to present an XR setting.
  • These systems may provide computing resources for generating XR settings, and may work in conjunction with one another to generate and/or present XR settings.
  • a smartphone or a tablet can connect with a head-mounted display to present XR settings.
  • a computer may connect with home entertainment components or vehicular systems to provide an on-window display or a heads-up display.
  • Electronic systems displaying XR settings may utilize display technologies such as LEDs, OLEDs, QD- LEDs, liquid crystal on silicon, a laser scanning light source, a digital light projector, or combinations thereof.
  • Display technologies can employ substrates, through which light is transmitted, including light waveguides, holographic substrates, optical reflectors and combiners, or combinations thereof.
  • an electronic device comprises one or more processors working with non-transitory memory.
  • the non- transitory memory stores one or more programs of executable instructions that are executed by the one or more processors.
  • a computer (readable) storage medium has instructions that, when executed by one or more processors of an electronic device, cause the electronic device to perform, or cause performance, of any of the techniques and processes described herein.
  • the computer (readable) storage medium is non-transitory.
  • a device includes one or more processors, a non-transitory memory, and means for performing or causing performance of the techniques and processes described herein.
  • An electronic device that is presenting an immersive setting, such as an XR setting, to a user may capture images using image sensors. Exemplary uses of these images include recreating representations of physical objects in an XR setting, and translating a user’s physical movements into an avatar’s movement in the XR setting. However, it is possible that some of these images could unintentionally and incidentally include information that a user would rather not input into an XR setting. These ranges could range from unnecessary imagery within a field of view to more sensitive information such as a user’s physical attributes.
  • FIG. 1 is a block diagram of an example operating architecture 100 in accordance with some implementations. While pertinent features are shown, those of ordinary skill in the art will appreciate from the present disclosure that various other features have not been illustrated for the sake of brevity and so as not to obscure more pertinent aspects of the example implementations disclosed herein. To that end, as a non-limiting example, the operating architecture 100 includes an electronic device 120.
  • the electronic device 120 is configured to present XR content to a user.
  • the electronic device 120 includes a suitable combination of software, firmware, and/or hardware.
  • the electronic device 120 presents, via a display 122, XR content to the user while the user is physically present within a physical setting 105 that includes a table 107 within the field-of- view 111 of the electronic device 120.
  • the user holds the electronic device 120 in his/her hand(s).
  • the electronic device 120 is configured to display a virtual object (e.g., a virtual box 109) and to enable video pass-through of the physical setting 105 (e.g., including a representation 117 of the table 107) on a display 122.
  • a virtual object e.g., a virtual box 109
  • video pass-through of the physical setting 105 e.g., including a representation 117 of the table 107
  • the controller 110 is configured to manage and coordinate presentation of XR content for the user.
  • the controller 110 includes a suitable combination of software, firmware, and/or hardware.
  • the controller 110 is described in greater detail below with respect to Figure 2.
  • the controller 110 is a computing device that is local or remote relative to the physical setting 105.
  • the controller 110 is a local server located within the physical setting 105.
  • the controller 110 is a remote server located outside of the physical setting 105 (e.g., a cloud server, central server, etc.).
  • the controller 110 is communicatively coupled with the electronic device 120 via one or more wired or wireless communication channels 144 (e.g., BLUETOOTH, IEEE 802.1 lx, IEEE 802.16x, IEEE 802.3x, etc.). In another example, the controller 110 is included within the enclosure of the electronic device 120.
  • wired or wireless communication channels 144 e.g., BLUETOOTH, IEEE 802.1 lx, IEEE 802.16x, IEEE 802.3x, etc.
  • the electronic device 120 is configured to present the
  • the electronic device 120 includes a suitable combination of software, firmware, and/or hardware.
  • the electronic device 120 is described in greater detail below with respect to Figure 3.
  • the functionalities of the controller 110 are provided by and/or combined with the electronic device 120.
  • the electronic device 120 presents XR content to the user while the user is virtually and/or physically present within the physical setting 105.
  • the user wears the electronic device 120 on his/her head.
  • the electronic device includes a head-mounted system (HMS), head-mounted device (HMD), or head-mounted enclosure (HME).
  • HMS head-mounted system
  • HMD head-mounted device
  • HME head-mounted enclosure
  • the electronic device 120 includes one or more XR displays provided to display the XR content.
  • the electronic device 120 encloses the field-of-view of the user.
  • the electronic device 120 is a handheld device (such as a smartphone or tablet) configured to present XR content, and rather than wearing the electronic device 120, the user holds the device with a display directed towards the field-of-view of the user and a camera directed towards the physical setting 105.
  • the handheld device can be placed within an enclosure that can be worn on the head of the user.
  • the electronic device 120 is replaced with an XR chamber, enclosure, or room configured to present XR content in which the user does not wear or hold the electronic device 120.
  • the physical setting 105 includes a person other than the user and the camera captures one or more images of the physical setting 105 including the person.
  • the one or more images may include one or more physiques of the person such as an iris pattern, a fingerprint, or a facial shape.
  • FIG. 2 is a block diagram of an example of the controller 110 in accordance with some implementations. While certain specific features are illustrated, those skilled in the art will appreciate from the present disclosure that various other features have not been illustrated for the sake of brevity, and so as not to obscure more pertinent aspects of the implementations disclosed herein.
  • the controller 110 includes one or more processing units 202 (e.g., microprocessors, application-specific integrated-circuits (ASICs), field-programmable gate arrays (FPGAs), graphics processing units (GPUs), central processing units (CPUs), processing cores, and/or the like), one or more input/output (I/O) devices 206, one or more communication interfaces 208 (e.g., universal serial bus (USB), FIREWIRE, THUNDERBOLT, IEEE 802.3x, IEEE 802.1 lx, IEEE 802.16x, global system for mobile communications (GSM), code division multiple access (CDMA), time division multiple access (TDMA), global positioning system (GPS), infrared (IR), BLUETOOTH, ZIGBEE, and/or the like type interface), one or more programming (e.g., I/O) interfaces 210, a memory 220, and one or more communication buses 204 for interconnecting these and various other processing units 202 (e.g., microprocessors,
  • the one or more communication buses 204 include circuitry that interconnects and controls communications between system components.
  • the one or more I/O devices 206 include at least one of a keyboard, a mouse, a touchpad, a joystick, one or more microphones, one or more speakers, one or more image sensors, one or more displays, and/or the like.
  • the memory 220 includes high-speed random-access memory, such as dynamic random-access memory (DRAM), static random-access memory (SRAM), double-data-rate random-access memory (DDR RAM), or other random-access solid-state memory devices.
  • the memory 220 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices.
  • the memory 220 optionally includes one or more storage devices remotely located from the one or more processing units 202.
  • the memory 220 comprises a non-transitory computer readable storage medium.
  • the memory 220 or the non-transitory computer readable storage medium of the memory 220 stores the following programs, modules and data structures, or a subset thereof including an optional operating system 230 and an XR content module 240.
  • the operating system 230 includes procedures for handling various basic system services and for performing hardware dependent tasks.
  • the XR content module 240 is configured to manage and coordinate presentation of XR content for one or more users (e.g., a single set of XR content for one or more users, or multiple sets of XR content for respective groups of one or more users).
  • the XR content module 240 includes a data obtaining unit 242, a tracking unit 244, a coordination unit 246, and a data transmitting unit 248.
  • the data obtaining unit 242 is configured to obtain data (e.g., presentation data, interaction data, sensor data, location data, etc.) from at least the electronic device 120 of Figure 1.
  • data e.g., presentation data, interaction data, sensor data, location data, etc.
  • the data obtaining unit 242 includes instructions and/or logic therefor, and heuristics and metadata therefor.
  • the tracking unit 244 is configured to map the physical setting 105 and to track the position/location of at least the electronic device 120 with respect to the physical setting 105 of Figure 1. To that end, in various implementations, the tracking unit 244 includes instructions and/or logic therefor, and heuristics and metadata therefor.
  • the coordination unit 246 is configured to manage and coordinate the presentation of XR content to the user by the electronic device 120.
  • the coordination unit 246 includes instructions and/or logic therefor, and heuristics and metadata therefor.
  • the data transmitting unit 248 is configured to transmit data (e.g., presentation data, location data, etc.) to at least the electronic device 120.
  • data e.g., presentation data, location data, etc.
  • the data transmitting unit 248 includes instructions and/or logic therefor, and heuristics and metadata therefor.
  • the data obtaining unit 242, the tracking unit 244, the coordination unit 246, and the data transmitting unit 248 are shown as residing on a single device (e.g., the controller 110), it should be understood that in other implementations, any combination of the data obtaining unit 242, the tracking unit 244, the coordination unit 246, and the data transmitting unit 248 may be located in separate computing devices.
  • Figure 2 is intended more as functional description of the various features that may be present in a particular implementation as opposed to a structural schematic of the implementations described herein.
  • items shown separately could be combined and some items could be separated.
  • some functional modules shown separately in Figure 2 could be implemented in a single module and the various functions of single functional blocks could be implemented by one or more functional blocks in various implementations.
  • the actual number of modules and the division of particular functions and how features are allocated among them will vary from one implementation to another and, in some implementations, depends in part on the particular combination of hardware, software, and/or firmware chosen for a particular implementation.
  • FIG. 3 is a block diagram of an example of the electronic device 120 in accordance with some implementations. While certain specific features are illustrated, those skilled in the art will appreciate from the present disclosure that various other features have not been illustrated for the sake of brevity, and so as not to obscure more pertinent aspects of the implementations disclosed herein.
  • the electronic device 120 includes one or more processing units 302 (e.g., microprocessors, ASICs, FPGAs, GPUs, CPUs, processing cores, and/or the like), one or more input/output (I/O) devices and sensors 306, one or more communication interfaces 308 (e.g., USB, FIREWIRE, THUNDERBOLT, IEEE 802.3x, IEEE 802.1 lx, IEEE 802.16x, GSM, CDMA, TDMA, GPS, IR, BLUETOOTH, ZIGBEE, and/or the like type interface), one or more programming (e.g., I/O) interfaces 310, one or more XR displays 312, one or more optional interior- and/or exterior-facing image sensors 314, a memory 320, and one or more communication buses 304 for interconnecting these and various other components.
  • processing units 302 e.g., microprocessors, ASICs, FPGAs, GPUs, CPUs, processing cores, and/or the like
  • the one or more communication buses 304 include circuitry that interconnects and controls communications between system components.
  • the one or more I/O devices and sensors 306 include at least one of an inertial measurement unit (IMU), an accelerometer, a gyroscope, a thermometer, one or more physiological sensors (e.g., blood pressure monitor, heart rate monitor, blood oxygen sensor, blood glucose sensor, etc.), one or more microphones 307A, one or more speakers 307B, a haptics engine, one or more depth sensors (e.g., a structured light, a time-of-flight, or the like), and/or the like.
  • IMU inertial measurement unit
  • an accelerometer e.g., an accelerometer
  • a gyroscope e.g., a Bosch Sensortec, etc.
  • a thermometer e.g., a thermometer
  • physiological sensors e.g., blood pressure monitor, heart rate monitor, blood oxygen sensor, blood glucose sensor, etc.
  • the one or more XR displays 312 are configured to display XR content to the user.
  • the one or more XR displays 312 correspond to holographic, digital light processing (DLP), liquid-crystal display (LCD), liquid- crystal on silicon (LCoS), organic light-emitting field-effect transitory (OLET), organic light- emitting diode (OLED), surface-conduction electron-emitter display (SED), field-emission display (FED), quantum-dot light-emitting diode (QD-LED), micro-electro-mechanical system (MEMS), and/or the like display types.
  • DLP digital light processing
  • LCD liquid-crystal display
  • LCDoS liquid- crystal on silicon
  • OLET organic light-emitting field-effect transitory
  • OLET organic light- emitting diode
  • SED surface-conduction electron-emitter display
  • FED field-emission display
  • QD-LED quantum-dot light-emitting diode
  • the one or more XR displays 312 correspond to diffractive, reflective, polarized, holographic, etc. waveguide displays.
  • the electronic device 120 includes a single XR display.
  • the electronic device 120 includes an XR display for each eye of the user.
  • the one or more XR displays 312 are capable of presenting MR and VR content.
  • the one or more image sensors 314 are configured to obtain image data that corresponds to at least a portion of the face of the user that includes the eyes of the user (any may be referred to as an eye-tracking camera). In some implementations, the one or more image sensors 314 are configured to be forward-facing so as to obtain image data that corresponds to the physical setting as would be viewed by the user if the electronic device 120 was not present (and may be referred to as a scene camera).
  • the one or more optional image sensors 314 can include one or more RGB cameras (e.g., with a complimentary metal-oxide-semiconductor (CMOS) image sensor or a charge-coupled device (CCD) image sensor), one or more infrared (IR) cameras, one or more event-based cameras, and/or the like.
  • CMOS complimentary metal-oxide-semiconductor
  • CCD charge-coupled device
  • IR infrared
  • the memory 320 includes high-speed random-access memory, such as DRAM,
  • the memory 320 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices.
  • the memory 320 optionally includes one or more storage devices remotely located from the one or more processing units 302.
  • the memory 320 comprises a non-transitory computer readable storage medium.
  • the memory 320 or the non-transitory computer readable storage medium of the memory 320 stores the following programs, modules and data structures, or a subset thereof including an optional operating system 330 and a XR presentation module 340.
  • the operating system 330 includes procedures for handling various basic system services and for performing hardware dependent tasks.
  • the XR presentation module 340 is configured to present XR content to the user via the one or more XR displays 312 and/or the I/O devices and sensors 306 (such as the one or more speakers 307B).
  • the XR presentation module 340 includes a data obtaining unit 342, an XR content presenting unit 344, and a data transmitting unit 346.
  • the data obtaining unit 342 is configured to obtain data (e.g., presentation data, interaction data, sensor data, location data, etc.) from at least the controller 110 of Figure 1. To that end, in various implementations, the data obtaining unit 342 includes instructions and/or logic therefor, and heuristics and metadata therefor.
  • the XR content presenting unit 344 is configured to present XR content to a user. In various implementations, the XR content presenting unit 344 presents XR content including a person where one or more physiques of the person have been obscured. To that end, in various implementations, the XR content presenting unit 344 includes instructions and/or logic therefor, and heuristics and metadata therefor.
  • the data transmitting unit 346 is configured to transmit data (e.g., presentation data, location data, etc.) to at least the controller 110.
  • data e.g., presentation data, location data, etc.
  • the data transmitting unit 346 includes instructions and/or logic therefor, and heuristics and metadata therefor.
  • the data obtaining unit 342, the XR content presenting unit 344, and the data transmitting unit 346 are shown as residing on a single device (e.g., the electronic device 120 of Figure 1), it should be understood that in other implementations, any combination of the data obtaining unit 342, the XR content presenting unit 344, and the data transmitting unit 346 may be located in separate computing devices.
  • Figure 3 is intended more as a functional description of the various features that could be present in a particular implementation as opposed to a structural schematic of the implementations described herein. Items shown separately could be combined and some items could be separated. For example, some functional modules shown separately in Figure 3 could be implemented in a single module and the various functions of single functional blocks could be implemented by one or more functional blocks in various implementations. The actual number of modules and the division of particular functions and how features are allocated among them will vary from one implementation to another and, in some implementations, depends in part on the particular combination of hardware, software, and/or firmware chosen for a particular implementation.
  • Figure 4A illustrates a first XR setting 400 based on a physical setting surveyed by an image sensor of a device.
  • the image sensor is part of a device that is used by the user and includes a display that displays the first XR setting 400.
  • the user is physically present in the physical setting.
  • the image sensor is part of a remote device (such as a drone or robotic avatar) that transmits images from the image sensor to a local device that is worn by the user and includes a display that displays the first XR setting 400.
  • the image sensor of the device may incidentally capture images showing a person’s eyes, hand, or face.
  • the device processes the portion of the image including the person’s eyes, hand, or face. For example, in various implementations, the device removes information, such as blurring the portion of the image, adding noise to the portion of the image, of replacing the portion of the image. In various implementations, the amount of information removed is sufficient to defeat replication for authentication purposes, but minor enough to be overlooked by the user so as to not distract from the user experience.
  • the first XR setting 400 includes a plurality of objects, including one or more physical elements (e.g., a table 412, a lamp 414, and a person 460) and one or more virtual objects (e.g., a virtual box 422).
  • each object is displayed at a location in the first XR setting 400, e.g., at a location defined by three coordinates in a three- dimensional (3D) XR coordinate system.
  • 3D three- dimensional
  • the objects are moved on the display of the device, but retain their location in the first XR setting 400.
  • certain virtual objects are displayed at locations on the display such that when the user moves in the first XR setting 400, the objects are stationary on the display on the device.
  • the person 460 has eyes 461 having an iris pattern, a face 462 having a set of facial dimensions, and hands 463 having fingerprints.
  • the iris pattern is characterized by a plurality of biometric measurements.
  • the facial dimensions constitute a plurality of biometric measurements.
  • the facial shapes constitute a plurality of biometric measurements.
  • the fingerprints are characterized by a plurality of biometric measurements.
  • Figure 4B illustrates a second XR setting 450 based on the physical setting of
  • the second XR setting 450 includes a plurality of objects, including one or more physical elements (e.g., the table 412, the lamp 414, and the person 460) and one or more virtual objects (e.g., a virtual box 422, substitute eyes 471 displayed over the eyes 461 of the person 460, a substitute face 472 displayed over the face 462 of the person 460, and substitute fingerprints 473 displayed over the fingerprints 463 of the person 460).
  • the substitute eyes 471 include a substitute iris pattern that is different than the iris pattern of the eyes 461 of the person 460.
  • the substitute iris pattern is characterized by a plurality of biometric measurements that are different than the plurality of biometric measurements that characterize the iris pattern of the person 460.
  • the substitute eyes 471 are selected from a set of predetermined substitute eyes, each having a particular iris pattern.
  • the substitute eyes 471 are selected as those from the set of predetermined substitute eyes that most closely match the eyes 461 of the person 460, e.g., based on shape, size, or color.
  • the substitute eyes 471 are a blurred version of the eyes 461 of the person.
  • the substitute face 472 is characterized by a plurality of substitute facial dimensions that are different than the facial dimensions of the face 462 of the person 460.
  • the plurality of substitute facial dimensions constitute a plurality of biometric measurements that are different than the plurality of biometric measurements that characterize the face 462 of the person 460.
  • the substitute facial dimensions are determined based on adding random noise to the facial dimensions of the face 462 of the person 460.
  • the random noise is of sufficient strength such that the substitute facial dimensions would be rejected by an authentication system, but not of sufficient strength such that the substitute face 472 is noticeably different to the user than the face 462 of the person 460.
  • the substitute fingerprints 473 are characterized by a plurality of biometric measurements that are different than the plurality of biometric measurements of the fingerprints 463 of the person 460.
  • the substitute fingerprints 473 are a set of default fingerprints.
  • second XR setting 450 includes one or more graphical user interface affordances indicating that a substitution has occurred.
  • the affordance may indicate the physical attribute that is being substituted (e.g., eyes, face, fingerprint).
  • Figure 5 is a flowchart representation of a method 500 of reducing physique information in an image in accordance with some implementations.
  • the method 500 is performed by a device with one or more processors, non- transitory memory, and a scene camera (e.g., the electronic device 120 of Figure 3).
  • the method 500 is performed by processing logic, including hardware, firmware, software, or a combination thereof.
  • the method 500 is performed by a processor executing instructions (e.g., code) stored in a non- transitory computer-readable medium (e.g., a memory).
  • the method 500 begins, in block 510, with the device capturing, using the image sensor, a captured image of a person.
  • the method 500 is described herein as being performed on a single person in the captured image, it is to be appreciated that the method 500 may be performed on multiple people in the captured image. Further, although the method 500 is described herein as being performed on a single captured image of a person, it is to be appreciated that the method 500 may be performed on multiple captured images of the person, e.g., for each frame of captured video data.
  • the method 500 continues, in block 520, with the device detecting, in the captured image, a biometric identifier (or physical attribute) of the person, wherein the biometric identifier is associated with a first plurality of biometric measurements.
  • the biometric identifier is at least one of a fingerprint, an iris pattern, or a face.
  • the biometric identifier of the person is detected using any object detection algorithm, such as semantic segmentation.
  • detecting the biometric identifier includes determining the first plurality of biometric measurements.
  • the biometric identifier is a haptic of the user, defined by how the user (or various portions thereof) moves over time.
  • the first plurality of biometric measurements includes a relative position or velocity of a body part of the user.
  • the biometric identifier is based on the captured image and a previously captured image.
  • the method 500 is described herein as being performed on a single detected biometric identifier of a person, it is to be appreciated that the method 500 may be performed on multiple biometric identifiers of a person in the captured image.
  • the method 500 continues, in block 530, with the device generating, from the captured image, a modified image by replacing the biometric identifier with a substitute biometric identifier (or replacing the physical attribute with a substitute attribute) associated with a second plurality of biometric measurements different than the first plurality of biometric measurements.
  • the first plurality of biometric measurements is sufficient to authenticate the person using an authentication system or authentication device, however, the second plurality of biometric measurements is insufficient to authenticate the person using the authentication system or authentication device.
  • the modified image is imperceptibly different to a user of the device as compared to the captured image.
  • this is true because the substitute biometric identifier imperceptibly differs from the biometric identifier.
  • this is true because the substitute biometric identifier is photorealistic.
  • the substitute biometric identifier is a default substitute biometric identifier. Thus, in various implementations, the substitute biometric identifier is not based on the biometric identifier. In various implementations, the substitute biometric identifier is selected from a plurality of predetermined substitute biometric identifiers, e.g., based on a similarity to the biometric identifier. In various implementations, the substitute biometric identifier is a blurred (or otherwise filtered) version of the biometric identifier. Thus, in various implementations, the substitute biometric identifier is based on the biometric identifier.
  • the substitute biometric identifier is generated by adding noise to the first plurality of biometric measurements to generate the second plurality of biometric measurements.
  • the substitute biometric identifier is based on the first plurality of biometric measurements.
  • the substitute biometric identifier includes a trace that indicates that the modified image includes the substitute biometric identifier (e.g., as a replacement for the biometric identifier).
  • the substitute biometric identifier includes a trace that indicates that the captured image has been modified to generate the modified image.
  • an authentication system or authentication device can detect the trace and deny authentication when presented with the substitute biometric identifier.
  • the substitute biometric identifier is a default substitute biometric identifier or selected from a plurality of predetermined substitute biometric identifiers
  • each of those substitute biometric identifiers may be known to the authentication system or authentication device.
  • the noise added to the first plurality of biometric measurements may spell out such a trace (e.g., the noise includes setting the least significant bit of each of the first plurality of biometric measurements to zero).
  • the method 500 continues, in block 540, with the device storing, in the non- transitory memory, the modified image.
  • the device stores the modified image without storing the captured image.
  • a device that is displaying the modified image also displays an indication of the portion of the image that has been substituted, for example, an graphical user interface affordance can indicate that the fingerprints has been modified.
  • the modified image includes a substitution indicator indicating that the physical attribute has been replaced.
  • the method 500 includes a similar procedure to reduce biometric identifiers within a person’s voice.
  • the method 500 includes receiving audio data and detecting, in the audio data, a voice of the person, wherein the voice is associated with a third plurality of biometric measurements.
  • the method includes generating, from the audio data, modified audio data by replacing the voice with a substitute voice associated with a fourth plurality of biometric measurements different than the third plurality of biometric measurements and storing, in the non-transitory memory, the modified audio data.
  • the techniques discussed above aim to reduce the unintended and incidental production of information in extended reality settings.
  • This information may also include other features, such as facial features, that may be perceived as biometric, or otherwise more sensitive, in nature.
  • the present disclosure recognizes that the use of camera imagery in XR reflects a balance of competing desires and concerns. For example, users may wish to share their portraits, as is witnessed by the proliferation of social media platforms. Some users, however, may wish to share less. Expectations of how much information a device should programmatically reduce may thus vary with user sensitivities and social norms.
  • the present disclosure also contemplates embodiments that help users understand how extensions of their XR selves can be perceived by others.
  • the present technology can inform and allow users to determine how much their XR avatars should reflect their real life motion and/or appearance in the physical world.
  • Options to “opt in” or “opt out” of the use of, for example, camera imagery to affect the display of an avatar can be used to reduce information shared.
  • the present technology can also inform and allow receipts to understand when they are viewing information that has been modified, for example, when a fingerprint is being obscured. It is an aim of the present disclosure to, in a transparent manner, reduce the likelihood that certain unintentional and incidental information be shared over XR.

Abstract

In one implementation, a technique is performed by the device including a processor, non-transitory memory, and an image sensor. The technique includes capturing, using the image sensor, a captured image of a person. The technique includes detecting, in the captured image, a portion of the person, wherein the portion is associated with a first plurality of measurements. The technique includes generating, from the captured image, a modified image by replacing the portion with a substitute portion associated with a second plurality of measurements different than the first plurality of measurements. The technique includes storing, in the non-transitory memory, the modified image.

Description

METHOD AND DEVICE FOR PROCESSING CAMERA
IMAGES
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S . Provisional Patent App. No. 62/904,971 , filed on September 24, 2019, which is hereby incorporated by reference in its entirety.
TECHNICAL FIELD
[0002] The present disclosure relates to techniques for processing and filtering camera images.
BACKGROUND
[0003] Implementations of computer-generated experiences involve the use of image sensors such as cameras. These sensors may capture images of a user, for example for purposes of determining the user’s movements and translating those movements into an immersive experience. In doing so, these sensors may also incidentally capture images that may be perceived as unnecessary.
BRIEF DESCRIPTION OF THE DRAWINGS
[0004] So that the present disclosure can be understood by those of ordinary skill in the art, a more detailed description may be had by reference to aspects of some illustrative implementations, some of which are shown in the accompanying drawings.
[0005] Figure 1 is a block diagram of an example operating architecture in accordance with some implementations.
[0006] Figure 2 is a block diagram of an example controller in accordance with some implementations .
[0007] Figure 3 is a block diagram of an example electronic device in accordance with some implementations.
[0008] Figure 4A illustrates a first setting based on a scene camera of a device.
[0009] Figure 4B illustrates a second, processed, setting based on a scene camera of a device. [0010] Figure 5 is a flowchart representation of a processing technique in accordance with some implementations.
[0011] In accordance with common practice the various features illustrated in the drawings may not be drawn to scale. Accordingly, the dimensions of the various features may be arbitrarily expanded or reduced for clarity. In addition, some of the drawings may not depict all of the components of a given system, method or device. Finally, like reference numerals may be used to denote like features throughout the specification and figures.
SUMMARY
[0012] In various implementations, a method is performed at a device including one or more processors, non-transitory memory, and an image sensor. The method includes capturing, using the image sensor, a captured image of a person. The method includes detecting, in the captured image, a portion of the person, wherein the portion is associated with a first plurality of measurements. The method includes generating, from the captured image, a modified image by replacing the portion with a substitute portion associated with a second plurality of measurements different than the first plurality of measurements. The method includes storing, in the non-transitory memory, the modified image.
[0013] In some embodiments, an electronic device comprises one or more processors working with non-transitory memory. In some embodiments, the non-transitory memory stores one or more programs of executable instructions that are executed by the one or more processors. In some embodiments, the executable instructions carry out the techniques and processes described herein. In some embodiments, a computer (readable) storage medium has instructions that, when executed by one or more processors of an electronic device, cause the electronic device to perform, or cause performance, of any of the techniques and processes described herein. The computer (readable) storage medium is non-transitory. In some embodiments, a device includes one or more processors, a non-transitory memory, and means for performing or causing performance of the techniques and processes described herein.
DESCRIPTION
[0014] Various examples of electronic systems and techniques for using such systems in relation to various extended reality (XR) technologies are described.
[0015] Physical settings are those in the world where people can sense and/or interact without use of electronic systems. For example, a room is a physical setting that includes physical elements, such as, physical chairs, physical desks, physical lamps, and so forth. A person can sense and interact with these physical elements of the physical setting through direct touch, taste, sight, smell, and hearing.
[0016] In contrast to a physical setting, an extended reality (XR) setting refers to a computer-produced environment that is partially or entirely generated using computer- produced content. While a person can interact with the XR setting using various electronic systems, this interaction utilizes various electronic sensors to monitor the person’s actions, and translates those actions into corresponding actions in the XR setting. For example, if an XR system detects that a person is looking upward, the XR system may change its graphics and audio output to present XR content in a manner consistent with the upward movement. XR settings may incorporate laws of physics to mimic physical settings.
[0017] Concepts of XR include virtual reality (VR) and augmented reality (AR).
Concepts of XR also include mixed reality (MR), which is sometimes used to refer to the spectrum of realities between physical settings (but not including physical settings) at one end and VR at the other end. Concepts of XR also include augmented virtuality (AV), in which a virtual or computer-produced setting integrates sensory inputs from a physical setting. These inputs may represent characteristics of a physical setting. For example, a virtual object may be displayed in a color captured, using an image sensor, from the physical setting. As another example, an AV setting may adopt current weather conditions of the physical setting.
[0018] Some electronic systems for implementing XR operate with an opaque display and one or more imaging sensors for capturing video and/or images of a physical setting. In some implementations, when a system captures images of a physical setting, and displays a representation of the physical setting on an opaque display using the captured images, the displayed images are called a video pass-through. Some electronic systems for implementing XR operate with an optical see-through display that may be transparent or semi-transparent (and optionally with one or more imaging sensors). Such a display allows a person to view a physical setting directly through the display, and allows for virtual content to be added to the person’s field-of-view by superimposing the content over an optical pass-through of the physical setting. Some electronic systems for implementing XR operate with a projection system that projects virtual objects onto a physical setting. The projector may present a holograph onto a physical setting, or may project imagery onto a physical surface, or may project onto the eyes (e.g., retina) of a person, for example. [0019] Electronic systems providing XR settings can have various form factors. A smartphone or a tablet computer may incorporate imaging and display components to present an XR setting. A head-mountable system may include imaging and display components to present an XR setting. These systems may provide computing resources for generating XR settings, and may work in conjunction with one another to generate and/or present XR settings. For example, a smartphone or a tablet can connect with a head-mounted display to present XR settings. As another example, a computer may connect with home entertainment components or vehicular systems to provide an on-window display or a heads-up display. Electronic systems displaying XR settings may utilize display technologies such as LEDs, OLEDs, QD- LEDs, liquid crystal on silicon, a laser scanning light source, a digital light projector, or combinations thereof. Display technologies can employ substrates, through which light is transmitted, including light waveguides, holographic substrates, optical reflectors and combiners, or combinations thereof. In some embodiments, an electronic device comprises one or more processors working with non-transitory memory. In some embodiments, the non- transitory memory stores one or more programs of executable instructions that are executed by the one or more processors. In some embodiments, the executable instructions carry out the techniques and processes described herein. In some embodiments, a computer (readable) storage medium has instructions that, when executed by one or more processors of an electronic device, cause the electronic device to perform, or cause performance, of any of the techniques and processes described herein. The computer (readable) storage medium is non-transitory. In some embodiments, a device includes one or more processors, a non-transitory memory, and means for performing or causing performance of the techniques and processes described herein.
[0020]
[0021] Numerous details are described in order to provide a thorough understanding of the example implementations shown in the drawings. However, the drawings merely show some example aspects of the present disclosure and are therefore not to be considered limiting. Those of ordinary skill in the art will appreciate that other effective aspects and/or variants do not include all of the specific details described herein. Moreover, well-known systems, methods, components, devices, and circuits have not been described in exhaustive detail so as not to obscure more pertinent aspects of the example implementations described herein.
[0022] An electronic device that is presenting an immersive setting, such as an XR setting, to a user may capture images using image sensors. Exemplary uses of these images include recreating representations of physical objects in an XR setting, and translating a user’s physical movements into an avatar’s movement in the XR setting. However, it is possible that some of these images could unintentionally and incidentally include information that a user would rather not input into an XR setting. These ranges could range from unnecessary imagery within a field of view to more sensitive information such as a user’s physical attributes.
[0023] There may be advantages to systemically block certain kinds of content, such as exposure of a user’ s physique or physical attributes that may be exploited towards biometric identification, from dissemination. Despite advancements in authentication techniques, such as sophisticated fingerprint scanners that reject photographic replicas of fingers, over-exposure may still bring about concerns. In some embodiments, images captured by a device are modified such that portions suggestive of biometric identification are obscured. In some embodiments, discrete modifications are employed so as to maintain the fidelity of the user experience, such that the XR representation of a user remains consistent with the physical user.
[0024] Figure 1 is a block diagram of an example operating architecture 100 in accordance with some implementations. While pertinent features are shown, those of ordinary skill in the art will appreciate from the present disclosure that various other features have not been illustrated for the sake of brevity and so as not to obscure more pertinent aspects of the example implementations disclosed herein. To that end, as a non-limiting example, the operating architecture 100 includes an electronic device 120.
[0025] In some implementations, the electronic device 120 is configured to present XR content to a user. In some implementations, the electronic device 120 includes a suitable combination of software, firmware, and/or hardware. According to some implementations, the electronic device 120 presents, via a display 122, XR content to the user while the user is physically present within a physical setting 105 that includes a table 107 within the field-of- view 111 of the electronic device 120. As such, in some implementations, the user holds the electronic device 120 in his/her hand(s). In some implementations, the electronic device 120 is configured to display a virtual object (e.g., a virtual box 109) and to enable video pass-through of the physical setting 105 (e.g., including a representation 117 of the table 107) on a display 122.
[0026] In some implementations, the controller 110 is configured to manage and coordinate presentation of XR content for the user. In some implementations, the controller 110 includes a suitable combination of software, firmware, and/or hardware. The controller 110 is described in greater detail below with respect to Figure 2. In some implementations, the controller 110 is a computing device that is local or remote relative to the physical setting 105. For example, the controller 110 is a local server located within the physical setting 105. In another example, the controller 110 is a remote server located outside of the physical setting 105 (e.g., a cloud server, central server, etc.). In some implementations, the controller 110 is communicatively coupled with the electronic device 120 via one or more wired or wireless communication channels 144 (e.g., BLUETOOTH, IEEE 802.1 lx, IEEE 802.16x, IEEE 802.3x, etc.). In another example, the controller 110 is included within the enclosure of the electronic device 120.
[0027] In some implementations, the electronic device 120 is configured to present the
XR content to the user. In some implementations, the electronic device 120 includes a suitable combination of software, firmware, and/or hardware. The electronic device 120 is described in greater detail below with respect to Figure 3. In some implementations, the functionalities of the controller 110 are provided by and/or combined with the electronic device 120.
[0028] According to some implementations, the electronic device 120 presents XR content to the user while the user is virtually and/or physically present within the physical setting 105. In some implementations, the user wears the electronic device 120 on his/her head. For example, in some implementations, the electronic device includes a head-mounted system (HMS), head-mounted device (HMD), or head-mounted enclosure (HME). As such, the electronic device 120 includes one or more XR displays provided to display the XR content. For example, in various implementations, the electronic device 120 encloses the field-of-view of the user. In some implementations, the electronic device 120 is a handheld device (such as a smartphone or tablet) configured to present XR content, and rather than wearing the electronic device 120, the user holds the device with a display directed towards the field-of-view of the user and a camera directed towards the physical setting 105. In some implementations, the handheld device can be placed within an enclosure that can be worn on the head of the user. In some implementations, the electronic device 120 is replaced with an XR chamber, enclosure, or room configured to present XR content in which the user does not wear or hold the electronic device 120. In various implementations, the physical setting 105 includes a person other than the user and the camera captures one or more images of the physical setting 105 including the person. The one or more images may include one or more physiques of the person such as an iris pattern, a fingerprint, or a facial shape.
[0029] Figure 2 is a block diagram of an example of the controller 110 in accordance with some implementations. While certain specific features are illustrated, those skilled in the art will appreciate from the present disclosure that various other features have not been illustrated for the sake of brevity, and so as not to obscure more pertinent aspects of the implementations disclosed herein. To that end, as a non-limiting example, in some implementations the controller 110 includes one or more processing units 202 (e.g., microprocessors, application- specific integrated-circuits (ASICs), field-programmable gate arrays (FPGAs), graphics processing units (GPUs), central processing units (CPUs), processing cores, and/or the like), one or more input/output (I/O) devices 206, one or more communication interfaces 208 (e.g., universal serial bus (USB), FIREWIRE, THUNDERBOLT, IEEE 802.3x, IEEE 802.1 lx, IEEE 802.16x, global system for mobile communications (GSM), code division multiple access (CDMA), time division multiple access (TDMA), global positioning system (GPS), infrared (IR), BLUETOOTH, ZIGBEE, and/or the like type interface), one or more programming (e.g., I/O) interfaces 210, a memory 220, and one or more communication buses 204 for interconnecting these and various other components.
[0030] In some implementations, the one or more communication buses 204 include circuitry that interconnects and controls communications between system components. In some implementations, the one or more I/O devices 206 include at least one of a keyboard, a mouse, a touchpad, a joystick, one or more microphones, one or more speakers, one or more image sensors, one or more displays, and/or the like.
[0031] The memory 220 includes high-speed random-access memory, such as dynamic random-access memory (DRAM), static random-access memory (SRAM), double-data-rate random-access memory (DDR RAM), or other random-access solid-state memory devices. In some implementations, the memory 220 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. The memory 220 optionally includes one or more storage devices remotely located from the one or more processing units 202. The memory 220 comprises a non-transitory computer readable storage medium. In some implementations, the memory 220 or the non-transitory computer readable storage medium of the memory 220 stores the following programs, modules and data structures, or a subset thereof including an optional operating system 230 and an XR content module 240.
[0032] The operating system 230 includes procedures for handling various basic system services and for performing hardware dependent tasks. In some implementations, the XR content module 240 is configured to manage and coordinate presentation of XR content for one or more users (e.g., a single set of XR content for one or more users, or multiple sets of XR content for respective groups of one or more users). To that end, in various implementations, the XR content module 240 includes a data obtaining unit 242, a tracking unit 244, a coordination unit 246, and a data transmitting unit 248.
[0033] In some implementations, the data obtaining unit 242 is configured to obtain data (e.g., presentation data, interaction data, sensor data, location data, etc.) from at least the electronic device 120 of Figure 1. To that end, in various implementations, the data obtaining unit 242 includes instructions and/or logic therefor, and heuristics and metadata therefor.
[0034] In some implementations, the tracking unit 244 is configured to map the physical setting 105 and to track the position/location of at least the electronic device 120 with respect to the physical setting 105 of Figure 1. To that end, in various implementations, the tracking unit 244 includes instructions and/or logic therefor, and heuristics and metadata therefor.
[0035] In some implementations, the coordination unit 246 is configured to manage and coordinate the presentation of XR content to the user by the electronic device 120. To that end, in various implementations, the coordination unit 246 includes instructions and/or logic therefor, and heuristics and metadata therefor.
[0036] In some implementations, the data transmitting unit 248 is configured to transmit data (e.g., presentation data, location data, etc.) to at least the electronic device 120. To that end, in various implementations, the data transmitting unit 248 includes instructions and/or logic therefor, and heuristics and metadata therefor.
[0037] Although the data obtaining unit 242, the tracking unit 244, the coordination unit 246, and the data transmitting unit 248 are shown as residing on a single device (e.g., the controller 110), it should be understood that in other implementations, any combination of the data obtaining unit 242, the tracking unit 244, the coordination unit 246, and the data transmitting unit 248 may be located in separate computing devices.
[0038] Moreover, Figure 2 is intended more as functional description of the various features that may be present in a particular implementation as opposed to a structural schematic of the implementations described herein. As recognized by those of ordinary skill in the art, items shown separately could be combined and some items could be separated. For example, some functional modules shown separately in Figure 2 could be implemented in a single module and the various functions of single functional blocks could be implemented by one or more functional blocks in various implementations. The actual number of modules and the division of particular functions and how features are allocated among them will vary from one implementation to another and, in some implementations, depends in part on the particular combination of hardware, software, and/or firmware chosen for a particular implementation.
[0039] Figure 3 is a block diagram of an example of the electronic device 120 in accordance with some implementations. While certain specific features are illustrated, those skilled in the art will appreciate from the present disclosure that various other features have not been illustrated for the sake of brevity, and so as not to obscure more pertinent aspects of the implementations disclosed herein. To that end, as a non-limiting example, in some implementations the electronic device 120 includes one or more processing units 302 (e.g., microprocessors, ASICs, FPGAs, GPUs, CPUs, processing cores, and/or the like), one or more input/output (I/O) devices and sensors 306, one or more communication interfaces 308 (e.g., USB, FIREWIRE, THUNDERBOLT, IEEE 802.3x, IEEE 802.1 lx, IEEE 802.16x, GSM, CDMA, TDMA, GPS, IR, BLUETOOTH, ZIGBEE, and/or the like type interface), one or more programming (e.g., I/O) interfaces 310, one or more XR displays 312, one or more optional interior- and/or exterior-facing image sensors 314, a memory 320, and one or more communication buses 304 for interconnecting these and various other components.
[0040] In some implementations, the one or more communication buses 304 include circuitry that interconnects and controls communications between system components. In some implementations, the one or more I/O devices and sensors 306 include at least one of an inertial measurement unit (IMU), an accelerometer, a gyroscope, a thermometer, one or more physiological sensors (e.g., blood pressure monitor, heart rate monitor, blood oxygen sensor, blood glucose sensor, etc.), one or more microphones 307A, one or more speakers 307B, a haptics engine, one or more depth sensors (e.g., a structured light, a time-of-flight, or the like), and/or the like.
[0041] In some implementations, the one or more XR displays 312 are configured to display XR content to the user. In some implementations, the one or more XR displays 312 correspond to holographic, digital light processing (DLP), liquid-crystal display (LCD), liquid- crystal on silicon (LCoS), organic light-emitting field-effect transitory (OLET), organic light- emitting diode (OLED), surface-conduction electron-emitter display (SED), field-emission display (FED), quantum-dot light-emitting diode (QD-LED), micro-electro-mechanical system (MEMS), and/or the like display types. In some implementations, the one or more XR displays 312 correspond to diffractive, reflective, polarized, holographic, etc. waveguide displays. For example, the electronic device 120 includes a single XR display. In another example, the electronic device 120 includes an XR display for each eye of the user. In some implementations, the one or more XR displays 312 are capable of presenting MR and VR content.
[0042] In some implementations, the one or more image sensors 314 are configured to obtain image data that corresponds to at least a portion of the face of the user that includes the eyes of the user (any may be referred to as an eye-tracking camera). In some implementations, the one or more image sensors 314 are configured to be forward-facing so as to obtain image data that corresponds to the physical setting as would be viewed by the user if the electronic device 120 was not present (and may be referred to as a scene camera). The one or more optional image sensors 314 can include one or more RGB cameras (e.g., with a complimentary metal-oxide-semiconductor (CMOS) image sensor or a charge-coupled device (CCD) image sensor), one or more infrared (IR) cameras, one or more event-based cameras, and/or the like.
[0043] The memory 320 includes high-speed random-access memory, such as DRAM,
SRAM, DDR RAM, or other random-access solid-state memory devices. In some implementations, the memory 320 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. The memory 320 optionally includes one or more storage devices remotely located from the one or more processing units 302. The memory 320 comprises a non-transitory computer readable storage medium. In some implementations, the memory 320 or the non-transitory computer readable storage medium of the memory 320 stores the following programs, modules and data structures, or a subset thereof including an optional operating system 330 and a XR presentation module 340.
[0044] The operating system 330 includes procedures for handling various basic system services and for performing hardware dependent tasks. In some implementations, the XR presentation module 340 is configured to present XR content to the user via the one or more XR displays 312 and/or the I/O devices and sensors 306 (such as the one or more speakers 307B). To that end, in various implementations, the XR presentation module 340 includes a data obtaining unit 342, an XR content presenting unit 344, and a data transmitting unit 346.
[0045] In some implementations, the data obtaining unit 342 is configured to obtain data (e.g., presentation data, interaction data, sensor data, location data, etc.) from at least the controller 110 of Figure 1. To that end, in various implementations, the data obtaining unit 342 includes instructions and/or logic therefor, and heuristics and metadata therefor. [0046] In some implementations, the XR content presenting unit 344 is configured to present XR content to a user. In various implementations, the XR content presenting unit 344 presents XR content including a person where one or more physiques of the person have been obscured. To that end, in various implementations, the XR content presenting unit 344 includes instructions and/or logic therefor, and heuristics and metadata therefor.
[0047] In some implementations, the data transmitting unit 346 is configured to transmit data (e.g., presentation data, location data, etc.) to at least the controller 110. To that end, in various implementations, the data transmitting unit 346 includes instructions and/or logic therefor, and heuristics and metadata therefor.
[0048] Although the data obtaining unit 342, the XR content presenting unit 344, and the data transmitting unit 346 are shown as residing on a single device (e.g., the electronic device 120 of Figure 1), it should be understood that in other implementations, any combination of the data obtaining unit 342, the XR content presenting unit 344, and the data transmitting unit 346 may be located in separate computing devices.
[0049] Moreover, Figure 3 is intended more as a functional description of the various features that could be present in a particular implementation as opposed to a structural schematic of the implementations described herein. Items shown separately could be combined and some items could be separated. For example, some functional modules shown separately in Figure 3 could be implemented in a single module and the various functions of single functional blocks could be implemented by one or more functional blocks in various implementations. The actual number of modules and the division of particular functions and how features are allocated among them will vary from one implementation to another and, in some implementations, depends in part on the particular combination of hardware, software, and/or firmware chosen for a particular implementation.
[0050] Figure 4A illustrates a first XR setting 400 based on a physical setting surveyed by an image sensor of a device. In various implementations, the image sensor is part of a device that is used by the user and includes a display that displays the first XR setting 400. Thus, in various implementations, the user is physically present in the physical setting. In various implementations, the image sensor is part of a remote device (such as a drone or robotic avatar) that transmits images from the image sensor to a local device that is worn by the user and includes a display that displays the first XR setting 400. [0051] In various implementations, the image sensor of the device may incidentally capture images showing a person’s eyes, hand, or face. In various implementations, the device processes the portion of the image including the person’s eyes, hand, or face. For example, in various implementations, the device removes information, such as blurring the portion of the image, adding noise to the portion of the image, of replacing the portion of the image. In various implementations, the amount of information removed is sufficient to defeat replication for authentication purposes, but minor enough to be overlooked by the user so as to not distract from the user experience.
[0052] The first XR setting 400 includes a plurality of objects, including one or more physical elements (e.g., a table 412, a lamp 414, and a person 460) and one or more virtual objects (e.g., a virtual box 422). In various implementations, each object is displayed at a location in the first XR setting 400, e.g., at a location defined by three coordinates in a three- dimensional (3D) XR coordinate system. Accordingly, when the user moves in the first XR setting 400 (e.g., changes either position and/or orientation), the objects are moved on the display of the device, but retain their location in the first XR setting 400. In various implementations, certain virtual objects are displayed at locations on the display such that when the user moves in the first XR setting 400, the objects are stationary on the display on the device.
[0053] The person 460 has eyes 461 having an iris pattern, a face 462 having a set of facial dimensions, and hands 463 having fingerprints. In various implementations, the iris pattern is characterized by a plurality of biometric measurements. In various implementations, the facial dimensions constitute a plurality of biometric measurements. In various implementations, the facial shapes constitute a plurality of biometric measurements. In various implementations, the fingerprints are characterized by a plurality of biometric measurements.
[0054] Figure 4B illustrates a second XR setting 450 based on the physical setting of
Figure 4A with biometric identifiers of the person 460 obscured. The second XR setting 450 includes a plurality of objects, including one or more physical elements (e.g., the table 412, the lamp 414, and the person 460) and one or more virtual objects (e.g., a virtual box 422, substitute eyes 471 displayed over the eyes 461 of the person 460, a substitute face 472 displayed over the face 462 of the person 460, and substitute fingerprints 473 displayed over the fingerprints 463 of the person 460). [0055] The substitute eyes 471 include a substitute iris pattern that is different than the iris pattern of the eyes 461 of the person 460. In particular, the substitute iris pattern is characterized by a plurality of biometric measurements that are different than the plurality of biometric measurements that characterize the iris pattern of the person 460. In various implementations, the substitute eyes 471 are selected from a set of predetermined substitute eyes, each having a particular iris pattern. The substitute eyes 471 are selected as those from the set of predetermined substitute eyes that most closely match the eyes 461 of the person 460, e.g., based on shape, size, or color. In various implementations, the substitute eyes 471 are a blurred version of the eyes 461 of the person.
[0056] The substitute face 472 is characterized by a plurality of substitute facial dimensions that are different than the facial dimensions of the face 462 of the person 460. The plurality of substitute facial dimensions constitute a plurality of biometric measurements that are different than the plurality of biometric measurements that characterize the face 462 of the person 460. In various implementations, the substitute facial dimensions are determined based on adding random noise to the facial dimensions of the face 462 of the person 460. In various implementations, the random noise is of sufficient strength such that the substitute facial dimensions would be rejected by an authentication system, but not of sufficient strength such that the substitute face 472 is noticeably different to the user than the face 462 of the person 460.
[0057] The substitute fingerprints 473 are characterized by a plurality of biometric measurements that are different than the plurality of biometric measurements of the fingerprints 463 of the person 460. In various implementations, the substitute fingerprints 473 are a set of default fingerprints. In some embodiments, second XR setting 450 includes one or more graphical user interface affordances indicating that a substitution has occurred. Optionally, the affordance may indicate the physical attribute that is being substituted (e.g., eyes, face, fingerprint).
[0058] Figure 5 is a flowchart representation of a method 500 of reducing physique information in an image in accordance with some implementations. In various implementations, the method 500 is performed by a device with one or more processors, non- transitory memory, and a scene camera (e.g., the electronic device 120 of Figure 3). In some implementations, the method 500 is performed by processing logic, including hardware, firmware, software, or a combination thereof. In some implementations, the method 500 is performed by a processor executing instructions (e.g., code) stored in a non- transitory computer-readable medium (e.g., a memory).
[0059] The method 500 begins, in block 510, with the device capturing, using the image sensor, a captured image of a person. Although the method 500 is described herein as being performed on a single person in the captured image, it is to be appreciated that the method 500 may be performed on multiple people in the captured image. Further, although the method 500 is described herein as being performed on a single captured image of a person, it is to be appreciated that the method 500 may be performed on multiple captured images of the person, e.g., for each frame of captured video data.
[0060] The method 500 continues, in block 520, with the device detecting, in the captured image, a biometric identifier (or physical attribute) of the person, wherein the biometric identifier is associated with a first plurality of biometric measurements. In various implementations, the biometric identifier is at least one of a fingerprint, an iris pattern, or a face. In various implementations, the biometric identifier of the person is detected using any object detection algorithm, such as semantic segmentation. In various implementations, detecting the biometric identifier includes determining the first plurality of biometric measurements.
[0061] In various implementations, the biometric identifier is a haptic of the user, defined by how the user (or various portions thereof) moves over time. For example, in various implementation, the first plurality of biometric measurements includes a relative position or velocity of a body part of the user. Accordingly, in various implementations, the biometric identifier is based on the captured image and a previously captured image.
[0062] Although the method 500 is described herein as being performed on a single detected biometric identifier of a person, it is to be appreciated that the method 500 may be performed on multiple biometric identifiers of a person in the captured image.
[0063] The method 500 continues, in block 530, with the device generating, from the captured image, a modified image by replacing the biometric identifier with a substitute biometric identifier (or replacing the physical attribute with a substitute attribute) associated with a second plurality of biometric measurements different than the first plurality of biometric measurements.
[0064] In various implementations, the first plurality of biometric measurements is sufficient to authenticate the person using an authentication system or authentication device, however, the second plurality of biometric measurements is insufficient to authenticate the person using the authentication system or authentication device.
[0065] Nevertheless, in various implementations, the modified image is imperceptibly different to a user of the device as compared to the captured image. In various implementations, this is true because the substitute biometric identifier imperceptibly differs from the biometric identifier. In various implementations, this is true because the substitute biometric identifier is photorealistic.
[0066] In various implementations, the substitute biometric identifier is a default substitute biometric identifier. Thus, in various implementations, the substitute biometric identifier is not based on the biometric identifier. In various implementations, the substitute biometric identifier is selected from a plurality of predetermined substitute biometric identifiers, e.g., based on a similarity to the biometric identifier. In various implementations, the substitute biometric identifier is a blurred (or otherwise filtered) version of the biometric identifier. Thus, in various implementations, the substitute biometric identifier is based on the biometric identifier. In various implementations, the substitute biometric identifier is generated by adding noise to the first plurality of biometric measurements to generate the second plurality of biometric measurements. Thus, in various implementations, the substitute biometric identifier is based on the first plurality of biometric measurements.
[0067] In various implementations, the substitute biometric identifier includes a trace that indicates that the modified image includes the substitute biometric identifier (e.g., as a replacement for the biometric identifier). Thus, in various implementations, the substitute biometric identifier includes a trace that indicates that the captured image has been modified to generate the modified image.
[0068] In various implementations, an authentication system or authentication device can detect the trace and deny authentication when presented with the substitute biometric identifier. For example, when the substitute biometric identifier is a default substitute biometric identifier or selected from a plurality of predetermined substitute biometric identifiers, each of those substitute biometric identifiers may be known to the authentication system or authentication device. As another example, the noise added to the first plurality of biometric measurements may spell out such a trace (e.g., the noise includes setting the least significant bit of each of the first plurality of biometric measurements to zero). [0069] The method 500 continues, in block 540, with the device storing, in the non- transitory memory, the modified image. In various implementations, the device stores the modified image without storing the captured image. In various implementations, a device that is displaying the modified image also displays an indication of the portion of the image that has been substituted, for example, an graphical user interface affordance can indicate that the fingerprints has been modified. Accordingly, in various implementations, the modified image includes a substitution indicator indicating that the physical attribute has been replaced.
[0070] In various implementations, the method 500 includes a similar procedure to reduce biometric identifiers within a person’s voice. Thus, in various implementations, the method 500 includes receiving audio data and detecting, in the audio data, a voice of the person, wherein the voice is associated with a third plurality of biometric measurements. The method includes generating, from the audio data, modified audio data by replacing the voice with a substitute voice associated with a fourth plurality of biometric measurements different than the third plurality of biometric measurements and storing, in the non-transitory memory, the modified audio data.
[0071] The techniques discussed above aim to reduce the unintended and incidental production of information in extended reality settings. This information may also include other features, such as facial features, that may be perceived as biometric, or otherwise more sensitive, in nature. The present disclosure recognizes that the use of camera imagery in XR reflects a balance of competing desires and concerns. For example, users may wish to share their portraits, as is witnessed by the proliferation of social media platforms. Some users, however, may wish to share less. Expectations of how much information a device should programmatically reduce may thus vary with user sensitivities and social norms.
[0072]
[0073] The present disclosure also contemplates embodiments that help users understand how extensions of their XR selves can be perceived by others. For example, the present technology can inform and allow users to determine how much their XR avatars should reflect their real life motion and/or appearance in the physical world. Options to “opt in” or “opt out” of the use of, for example, camera imagery to affect the display of an avatar can be used to reduce information shared. The present technology can also inform and allow receipts to understand when they are viewing information that has been modified, for example, when a fingerprint is being obscured. It is an aim of the present disclosure to, in a transparent manner, reduce the likelihood that certain unintentional and incidental information be shared over XR.
[0074] To the extent that entities collect, analyze, disclose, transfer, store, or otherwise use any personal information data, such entities should comply with well-established privacy policies and/or privacy practices. In particular, such entities would be expected to implement and consistently apply privacy practices that are generally recognized as meeting or exceeding industry or governmental requirements for maintaining the privacy of users.
[0075] The above-described embodiments are illustrative, and it is noted that the embodiments may be embodied in various forms. One skilled in the art should appreciate that an aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways.

Claims

What is claimed is:
1. A method comprising: at a device including one or more processors, non-transitory memory, and an image sensor: capturing, using the image sensor, a captured image of a person; detecting, in the captured image, a physical attribute of the person, wherein the physical attribute is associated with a first plurality of biometric measurements; generating, from the captured image, a modified image by replacing the physical attribute with a substitute attribute associated with a second plurality of biometric measurements different than the first plurality of biometric measurements; and storing, in the non-transitory memory, the modified image.
2. The method of claim 1, wherein the method excludes storing, in the non-transitory memory, the captured image.
3. The method of claims 1 or 2, wherein the physical attribute is a biometric identifier comprising at least one of a fingerprint, an iris pattern, or a face.
4. The method of any of claims 1-3, wherein the first plurality of biometric measurements is sufficient to authenticate the person and the second plurality of biometric measurements is insufficient to authenticate the person.
5. The method of any of claims 1-4, wherein the modified image is imperceptibly different to a user of the device as compared to the captured image.
6. The method of any of claims 1-5, wherein the substitute physical attribute is a default substitute attribute.
7. The method of any of claims 1-5, wherein the substitute physical attribute is selected from a plurality of predetermined substitute attributes.
8. The method of any of claims 1-5, wherein the substitute attribute is generated by adding noise to the first plurality of biometric measurements to generate the second plurality of biometric measurements.
9. The method of any of claims 1-5, wherein the substitute attribute is generated by filtering the physical attribute.
10. The method of any of claims 1-9, wherein the substitute attribute includes a trace indicating that the modified image includes the substitute physical attribute.
11. The method of any of claims 1-10, further comprising: receiving audio data; detecting, in the audio data, a voice of the person, wherein the voice is associated with a third plurality of biometric measurements; generating, from the audio data, modified audio data by replacing the voice with a substitute voice associated with a fourth plurality of biometric measurements different than the third plurality of biometric measurements; and storing, in the non-transitory memory, the modified audio data.
12. The method of any of claims 1-11, wherein the modified image includes a substitution indicator indicating that the physical attribute has been replaced.
13. A device comprising: one or more processors; a non-transitory memory; an image sensor; and one or more programs stored in the non-transitory memory, which, when executed by the one or more processors, cause the device to perform any of the methods of claims 1-12.
14. A non-transitory memory storing one or more programs, which, when executed by one or more processors of a device with an image sensor, cause the device to perform any of the methods of claims 1-12.
15. A device comprising: one or more processors; a non-transitory memory; an image sensor; and means for causing the device to perform any of the methods of claims 1-12.
16. A device comprising: an image sensor; a non-transitory memory; and one or more processors to: capture, using the image sensor, a captured image of a person; detect, in the captured image, a physical attribute of the person, wherein the physical attribute is associated with a first plurality of biometric measurements; generate, from the captured image, a modified image by replacing the physical attribute with a substitute attribute associated with a second plurality of biometric measurements different than the first plurality of biometric measurements; and store, in the non-transitory memory, the modified image.
PCT/US2020/051746 2019-09-24 2020-09-21 Method and device for processing camera images WO2021061551A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962904971P 2019-09-24 2019-09-24
US62/904,971 2019-09-24

Publications (1)

Publication Number Publication Date
WO2021061551A1 true WO2021061551A1 (en) 2021-04-01

Family

ID=72744911

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/051746 WO2021061551A1 (en) 2019-09-24 2020-09-21 Method and device for processing camera images

Country Status (1)

Country Link
WO (1) WO2021061551A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024084879A1 (en) * 2022-10-18 2024-04-25 Sony Semiconductor Solutions Corporation Image processing device, image processing method, and recording medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6067399A (en) * 1998-09-02 2000-05-23 Sony Corporation Privacy mode for acquisition cameras and camcorders
US20090262987A1 (en) * 2008-03-31 2009-10-22 Google Inc. Automatic face detection and identity masking in images, and applications thereof
WO2019143959A1 (en) * 2018-01-22 2019-07-25 Dakiana Research Llc Method and device for presenting synthesized reality companion content
CN110188603A (en) * 2019-04-17 2019-08-30 特斯联(北京)科技有限公司 A kind of privacy divulgence prevention method and its system for intelligence community

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6067399A (en) * 1998-09-02 2000-05-23 Sony Corporation Privacy mode for acquisition cameras and camcorders
US20090262987A1 (en) * 2008-03-31 2009-10-22 Google Inc. Automatic face detection and identity masking in images, and applications thereof
WO2019143959A1 (en) * 2018-01-22 2019-07-25 Dakiana Research Llc Method and device for presenting synthesized reality companion content
CN110188603A (en) * 2019-04-17 2019-08-30 特斯联(北京)科技有限公司 A kind of privacy divulgence prevention method and its system for intelligence community

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024084879A1 (en) * 2022-10-18 2024-04-25 Sony Semiconductor Solutions Corporation Image processing device, image processing method, and recording medium

Similar Documents

Publication Publication Date Title
CN110908503B (en) Method of tracking the position of a device
US9076033B1 (en) Hand-triggered head-mounted photography
US20150160461A1 (en) Eye Reflection Image Analysis
JP2023507867A (en) Artificial reality system with variable focus display for artificial reality content
US11100720B2 (en) Depth map generation
KR102644590B1 (en) Synchronization of positions of virtual and physical cameras
US11699412B2 (en) Application programming interface for setting the prominence of user interface elements
KR20210031957A (en) Process data sharing method and device
CN115136202A (en) Semantic annotation of point cloud clusters
WO2021061551A1 (en) Method and device for processing camera images
US11783552B2 (en) Identity-based inclusion/exclusion in a computer-generated reality experience
US11836842B2 (en) Moving an avatar based on real-world data
US20210097729A1 (en) Method and device for resolving focal conflict
US11468611B1 (en) Method and device for supplementing a virtual environment
WO2021041428A1 (en) Method and device for sketch-based placement of virtual objects
KR102551149B1 (en) Spatial relationships of point cloud clusters
US11763517B1 (en) Method and device for visualizing sensory perception
US11301035B1 (en) Method and device for video presentation
US11308716B1 (en) Tailoring a computer-generated reality experience based on a recognized object
US11836872B1 (en) Method and device for masked late-stage shift
US20240005536A1 (en) Perspective Correction of User Input Objects
US20220180473A1 (en) Frame Rate Extrapolation
CN112581628A (en) Method and apparatus for resolving focus conflicts
WO2022221108A1 (en) Presentation with audience feedback
WO2022010659A1 (en) Display calibration

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20786183

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20786183

Country of ref document: EP

Kind code of ref document: A1