CN117616361A

CN117616361A - Camera control using system sensor data

Info

Publication number: CN117616361A
Application number: CN202280048225.7A
Authority: CN
Inventors: 萨普纳·史洛夫; 大卫·陶; 碧林森; 胡均; 德米特里奥斯·罗勒·卡拉尼科斯; 塞巴斯蒂安·斯图克; 张兆年; 邝江涛; 刘丹尼; 李一雷
Original assignee: Meta Platforms Technologies LLC
Current assignee: Meta Platforms Technologies LLC
Priority date: 2021-07-07
Filing date: 2022-07-07
Publication date: 2024-02-27

Abstract

A method of using a camera in an augmented reality headset is provided. The method comprises the following steps: a signal is received from a sensor mounted on a head mounted device worn by a user, the signal being indicative of a user intent to acquire an image. The method further comprises the steps of: identifying a user intent for capturing an image based on a model classifying signals from the sensor according to the user intent; selecting a first image capturing device of the head mounted device based on technical parameters of the first image capturing device and user intent for capturing images; and acquiring the image using the first image acquisition device. An augmented reality headset, a memory storing instructions, and a processor executing the instructions to cause the augmented reality headset to perform the method as described above are also provided.

Description

Camera control using system sensor data

Technical Field

The present disclosure relates to a user interface in a smart eyewear device that includes one or more cameras for recording images and video. More particularly, the present disclosure relates to a method for automatically selecting a camera whose settings are more suitable for capturing images from one or more cameras of a smart-eyeglass device based on user input and gestures.

Background

In today's wearable platforms, these devices include cameras, sensors, and actuators configured to perform a variety of specific functions. In some instances, more than one of the accessory devices may be running at the same time, with only one of the accessory devices providing the best-used functionality for the task being processed. However, these multiple devices often lack an automatic mechanism for activation or deactivation, resulting in cumbersome user interaction in selecting a device to operate a selected task, or wasting scarce power resources while having an active device that is not operating.

Disclosure of Invention

In a first embodiment, an augmented reality headset device includes: the camera comprises a first camera and a second camera, wherein the first camera and the second camera are installed on a frame and respectively provided with a first view field and a second view field; a sensor mounted on the frame; a memory configured to store a plurality of instructions; and one or more processors configured to execute the instructions to cause the augmented reality headset to perform a method. The method comprises the following steps: receiving a signal from the mounted sensor, the signal being indicative of a user intent to acquire an image; identifying a user intent for capturing an image based on a model classifying signals from the sensor according to the user intent; selecting one of the first camera and the second camera based on the first field of view, the second field of view, and a user intent for capturing an image; and capturing an image using the selected camera.

In some embodiments, the sensor may be an inertial motion unit, and receiving the signal from the sensor includes: an orientation of the augmented reality headset relative to a fixed coordinate system is identified.

In some embodiments, to select one of the first camera or the second camera, the one or more processors may execute instructions for: the first camera is selected when the field of view of the first camera includes a point of interest of a user in a field of view in the augmented reality headset.

In some embodiments, the one or more cameras may execute instructions for: the first camera is deactivated when the second camera is selected based on the second user intent.

In some embodiments, the sensor may provide a signal indicative of a gesture of the user directed to the object of interest.

In some embodiments, the sensor is an eye tracking device mounted on the frame, the sensor configured to provide a signal indicative of a pupil position of the user.

In some embodiments, the sensor may be an inertial motion sensor configured to provide a signal indicative of an orientation of the augmented reality headset, and the one or more processes execute instructions for: the first camera is selected when the first field of view is aligned with an orientation of the augmented reality headset.

In some embodiments, the sensor may be one of a first camera or a second camera configured to acquire a user gesture indicative of the object of interest.

In some embodiments, the sensor may be a microphone configured to collect and identify voice commands indicative of a user's intent.

In some embodiments, the sensor may be a touch sensitive sensor configured to receive touch commands from a user.

In a second embodiment, a computer-implemented method includes: a signal is received from a sensor mounted on a head mounted device worn by a user, the signal being indicative of a user intent to acquire an image. The computer-implemented method further comprises: identifying a user intent for capturing an image based on a model classifying signals from the sensor according to the user intent; selecting a first image acquisition device in the head-mounted device based on technical parameters of the first image acquisition device and user intent for acquiring an image; and acquiring an image using the first image acquisition device.

In some embodiments, receiving the signal from the sensor may include one of: receiving an inertial signal from an inertial motion sensor; receiving sound collection from user voice; receiving a gesture; or to receive an active button press.

In some embodiments, selecting the first image acquisition device based on the technical parameters of the first image acquisition device may include: the first image acquisition device is selected when the field of view of the first image acquisition device includes points of interest in the field of view of the user in the head mounted device.

In some embodiments, the computer-implemented method may further comprise: when the user intends to be incompatible with the technical parameters of the first image capturing device in the head mounted device, the image capturing device is deactivated.

In some embodiments, the computer-implemented method may further comprise: from one or more image acquisition devices in the head-mounted device, an image acquisition device is selected whose technical parameters most closely match the user's intent.

In some embodiments, the computer-implemented method may further comprise: based on the second user intent, a second image capture device is selected and the first image capture device is deactivated.

In some embodiments, the technical parameter of the first image acquisition device may be a field of view, and wherein selecting the first image acquisition device comprises: verifying the field of view includes identifying objects of interest within the user's intent.

In some embodiments, receiving a signal from a head mounted sensor may include: a gesture of a user indicative of an object of interest is identified.

In some embodiments, receiving the signal from the sensor may include: receiving a pupil position of a user from an eye tracking device mounted on a head mounted device;

in some embodiments, receiving the signal from the sensor may include: identifying the orientation of the head-mounted device, and selecting the first image acquisition device includes selecting a camera having a field of view directed along the orientation of the head-mounted device.

In a third embodiment, a non-transitory computer-readable medium stores instructions that, when executed by one or more processors, cause the computer to perform a method. The method comprises the following steps: a signal is received from a sensor mounted on a head mounted device worn by a user, the signal being indicative of a user intent to acquire an image. The method further comprises the steps of: identifying a user intent for capturing an image based on a model classifying signals from the sensor according to the user intent; selecting a first image capturing device of the head mounted device based on technical parameters of the first image capturing device and user intent for capturing images; and acquiring the image using the first image acquisition device.

It should be understood that any feature described herein as being suitable for incorporation into one or more aspects or embodiments of the present disclosure is intended to be generic in any and all aspects and embodiments of the present disclosure. Other aspects of the disclosure will be appreciated by those skilled in the art from the description, claims and drawings of the disclosure. The foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the claims.

Drawings

Fig. 1 illustrates an architecture including one or more wearable devices coupled to each other, to a mobile device, to a remote server, and to a database, according to some embodiments.

Fig. 2A illustrates a user with smart glasses in a first configuration according to some embodiments.

Fig. 2B illustrates a user with smart glasses in a second configuration according to some embodiments.

Fig. 3 is a flow chart illustrating steps in a method of controlling one or more cameras in a smart eyewear device using multiple sensor data, in accordance with some embodiments.

FIG. 4 is a block diagram illustrating an exemplary computer system with which a head set and methods of using the same may be implemented, in accordance with some embodiments.

In the drawings, elements having the same or similar reference numerals share the same or similar features unless explicitly stated otherwise.

Detailed Description

In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be apparent, however, to one of ordinary skill in the art that the embodiments of the present disclosure may be practiced without some of these specific details. In other instances, well-known structures and techniques have not been shown in detail in order not to obscure the disclosure. The embodiments disclosed herein should be considered within the scope of the features and other embodiments shown in appendix I filed concurrently with the present application.

The multi-camera or multi-imager smart eyewear system may include multiple cameras or multiple imagers that may serve different functions. For example, one system may have multiple cameras pointing in different directions such that each camera covers a different portion of the view-sphere. In some embodiments, the following is possible: all, most, or at least some of the cameras are kept on all, or substantially all, or most of the time, or at least for a period of time, to ensure complete coverage of the field of view. However, this consumes power, generates unnecessary data and subsequent management, and is often inefficient. It is therefore desirable to effectively control the system by triggering the acquisition from only those cameras that will view the relevant activity.

Camera control for efficient and unsupervised/minimized supervision is manually controlled by directly controlling and triggering or switching on-off of each camera on a moment-by-moment view and related activities.

We propose a system level method of detecting relevant activity for efficient and unsupervised/minimized supervised camera control. We consider an embodiment of a multi-view system in which multiple cameras acquire different portions of the field of view. In some embodiments, the system is built into or attached to a pair of eyeglasses.

The system disclosed herein uses multiple sensors on the device, such as camera image streams, gestures, inertial Measurement Unit (IMU) data, audio cues, or several active button presses, which may be used alone or in combination to estimate where relevant activity may occur in the user's field of view. Eye-tracking sensors may be used to determine the gaze of the user. IMU data may be used to determine head angles indicating whether they are looking forward or downward. The gesture acquired by the camera flow may indicate a region of current interest. The audio prompt may signal, for example, the name of the person that needs to be collected. The camera flow may be used to detect and identify or track relevant persons in the field of view. Correlation of activity may be established using location and context sensing. These data may be used alone or in combination to estimate the relevant area. The system may apply a use case first model or may learn an appropriate model from various use cases using machine learning/deep learning to derive a good estimate. Once the relevant area is identified, the system may trigger events required by the appropriate camera, such as appropriate exposure/other control and complete acquisition.

The user wants to acquire his view of a distant landscape with a camera facing forward and possibly having a large field of view. When users look down at their hands, say, they may need to switch to a different downward facing camera with a different field of view and possibly different image quality (e.g., a narrower field of view with higher resolution) when they are frosting the biscuit or doing any other detailed manual work.

Fig. 1 illustrates an architecture 10 according to some embodiments that includes one or more wearable devices 100-1 (e.g., smart glasses) and 100-2 (e.g., smart watches) (hereinafter collectively referred to as "wearable devices 100") coupled to each other, to a mobile device 110, to a remote server 130, and to a database 152. The smart glasses 100-1 may be configured for AR/VR applications and the mobile device 110 may be a smart phone, both the smart glasses 100-1 and the mobile device 110 may communicate with each other and exchange the first data set 103-1 via wireless communication. The data set 103-1 may include recorded video, audio, or some other file or streaming media. The user 101 of the wearable device 100 is also the owner of the mobile device 110 or is associated with the mobile device. In some embodiments, the smart glasses may communicate directly with a remote server, database, or any other client device (e.g., smart phones of different users, etc.) via a network. The mobile devices may be communicatively coupled with remote servers and databases via the network 150 and communicate/share information and files, etc., with each other, such as data set 103-2 and data set 103-3 (hereinafter collectively referred to as "data set 103"). The network 150 may include, for example, any one or more of the following: local area network (local area network, LAN), wide area network (wide area network, WAN), the internet, etc. Further, the network may include, but is not limited to, any one or more of the following network topologies, including bus networks, star networks, ring networks, mesh networks, star bus networks, and tree or hierarchical networks, among others.

The smart glasses 100-1 may include a frame 105 that includes an eyepiece 107 for providing an image to the user 101. A camera 115 (e.g., front view) is mounted on the frame 105 and has a field of view (FOV). The user-oriented sensing device 128 is configured to track the pupil position of the user. The processor 112 is configured to identify a region of interest (region of interest, ROI) within the image that is viewed by the user 101. The interface device 129 indicates to the user 101: the FOV of camera 115 at least partially misses the ROI of the user. In some embodiments, the smart glasses 100-1 may also include a haptic actuator 125 for reproducing the haptic sensation to the user for VR/AR applications and a speaker 127 for transmitting a voice or sound signal to the user 101 (e.g., obtained using pupil tracking information from the sensing device 128) indicating an adjustment of gaze direction to improve the FOV of the camera 115. For example, in some embodiments, haptic actuator 125 may include a vibrating component that instructs the user to nudge his head position in a desired direction to align the FOV of front-looking camera 115 with the ROI, or to confirm to the user that the FOV is properly centered in the ROI.

In some embodiments, the smart glasses 100-1 may include a plurality of sensors 121, such as IMUs, gyroscopes, microphones, and capacitive sensors configured as a touch interface for a user. Other touch sensors may include pressure sensors, thermometers, and the like.

Further, the wearable device 100 or the mobile device 110 may include a storage circuit 120 storing a plurality of instructions and a processor circuit 112 configured to execute the instructions to cause the smart glasses 100-1 to at least partially perform some of the plurality of steps in a method consistent with the present disclosure. The storage circuitry 120 may also store data, such as calibration data for the position and orientation of the camera 115 relative to the FOV of the user. In some embodiments, the smart glasses 100-1, mobile device 110, server 130, and/or database 152 may also include a communication module 118 that enables the device to communicate wirelessly with the remote server 130 via the network 150. Accordingly, the smart glasses 100-1 may download multimedia online content (e.g., the data set 103-1) from the remote server 130 to at least partially perform some of the plurality of operations in the methods disclosed herein. In some embodiments, memory 120 may include a plurality of instructions as follows: these instructions cause the processor 112 to receive and combine signals from the sensors 121, avoid false positives, and better evaluate user intent and commands when receiving input signals from the user interface.

Fig. 2A and 2B illustrate two configurations in which a user 201 wears smart glasses 200. The smart glasses 200 include two cameras 215A and 215B (hereinafter, collectively referred to as "cameras 215"). The camera 215 has an FOV 220A and an FOV 220B (hereinafter, collectively referred to as "FOV 220"), respectively. The FOVs 220 are typically different and their characteristics depend on the technical parameters of the camera 215. For example, FOV 220A points directly in front of the face of user 201 and is wider than FOV 220B, FOV 220B points directly below, near the body of the user.

Fig. 2A illustrates a user 201 having smart glasses 200 in a first configuration according to some embodiments. Accordingly, the user 201 may focus on an object directly in front of her, and thus the first camera 215A (at the top of the user's right eyepiece) may be better suited to acquire images of objects of interest within the user FOV 200A. The system disclosed herein then selects to activate a camera 215A as shown, which has a view angle and a wide field of view 220A directly in front of the user.

Fig. 2B illustrates a user 201 with smart glasses 200 in a second configuration according to some embodiments. Accordingly, in this second configuration, the user 201 may focus on an object 230 located within a narrower field of view 220B at the level of the user's hand, directly under the user's face. In this case, the system as disclosed herein may choose to activate a second camera 215B at the top of the user's left eyepiece, which may be configured with a narrower field of view 220B looking down (compared to the first camera).

In some embodiments, the system is configured to automatically switch between the first camera and the second camera without user input when the user switches gestures and postures. In some embodiments, a degree of user input may be desirable when there is ambiguity between different user gestures, or when two or more cameras may have competing technical parameters related to the user's object of interest.

Fig. 3 is a flow chart illustrating steps in a method 300 of controlling one or more cameras in a smart-eyeglass device using multiple sensor (e.g., smart-eyeglass 100 and 200, and sensor 121) data, according to some embodiments. The smart glasses may also include one or more cameras, sensing devices, microphones, speakers, and haptic actuators (e.g., camera 115, sensing device 128, interface device 129, speaker 127, and haptic actuator 125) mounted on the frame. The smart glasses may also include a communication module 118 for sending and receiving data sets (e.g., communication module 118, client device 110, server 130, data set 103, and network 150) with a mobile device or server over a network when performing one or more steps in method 300. In an embodiment consistent with the present disclosure, at least one step in method 300 may be performed by a processor executing instructions stored in a memory circuit (e.g., processor 112, memory 120). In some embodiments, methods consistent with the present disclosure may include one or more of the steps of method 300 performed in a different order, simultaneously, quasi-simultaneously, or overlapping in time.

Step 302 includes receiving a signal from a sensor mounted on smart glasses worn by a user, the signal indicating a user intent. In some embodiments, step 302 includes one of the following: receiving an inertial signal from an inertial motion sensor; receiving sound collection from user voice; receiving a gesture; or to receive an active button press. In some embodiments, step 302 includes identifying a gesture of a user indicative of an object of interest. In some embodiments, step 302 includes receiving a pupil position from a user of an eye-tracking device mounted on a head-mounted device. In some embodiments, step 302 includes identifying an orientation of the head mounted device, and selecting the first image acquisition device includes selecting a camera having a field of view aligned with the orientation of the head mounted device.

Step 304 includes identifying user intent based on a model classifying signals from the sensors according to user intent.

Step 306 includes selecting a first image capture device in the smart glasses based on technical parameters of the first image capture device and user intent. In some embodiments, step 306 includes selecting the first image capture device when the field of view of the first image capture device includes a point of interest in the field of view of the user in the smart glasses. In some embodiments, step 306 includes deactivating the image capture device when the user intends to be incompatible with the technical parameters of at least one image capture device in the smart glasses. In some embodiments, step 306 includes selecting an image capture device from one or more image capture devices in the smart glasses whose technical parameters best match the user's intent. In some embodiments, step 306 includes selecting the second image capture device and deactivating the first image capture device based on the second user intent. In some embodiments, the technical parameter of the first image acquisition device is the field of view, and step 306 comprises: verifying the field of view includes identifying objects of interest within the user's intent.

Step 308 includes acquiring an image using a first image acquisition device.

For example, the subject technology is illustrated in accordance with various aspects described below. For convenience, various examples of aspects of the subject technology are described as numbered claims (claims 1, 2, etc.). These are provided as examples and are not limiting on the subject technology.

In one aspect, a method may be an operation, instruction, or function, and vice versa. In one aspect, a claim may be modified to include some or all of the words (e.g., instructions, operations, functions, or components) in one or more claims, one or more words, one or more sentences, one or more phrases, one or more paragraphs, and/or multiple words recited in one or more claims.

Hardware overview

Fig. 4 is a block diagram illustrating an exemplary computer system 400 with which the headset 100 and method 300 of fig. 1 may be implemented, in accordance with some embodiments. In some aspects, computer system 400 may be implemented using hardware, or a combination of software and hardware, in a dedicated server, or integrated into another entity, or distributed across multiple entities. Computer system 400 may include a desktop computer, notebook computer, tablet phone, smart phone, feature phone, server computer, or the like. The server computer may be remotely located in a data center or stored locally.

Computer system 400 includes a bus 408 or other communication mechanism for communicating information, and a processor 402 (e.g., processor 112) coupled with bus 408 for processing information. By way of example, computer system 400 may be implemented using one or more processors 402. Processor 402 may be a general purpose microprocessor, microcontroller, digital signal processor (digital signal processor, DSP), application specific integrated circuit (application specific integrated circuit, ASIC), field programmable gate array (field programmable gate array, FPGA), programmable logic device (programmable logic device, PLD), controller, state machine, gate logic, discrete hardware components, or any other suitable entity that can perform calculations or other information operations.

In addition to hardware, the computer system 400 may include code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them stored in an included memory 404 (e.g., memory 120), such as random access memory (random access memory, RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable PROM (EPROM), registers, hard disk, a removable disk, compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), or any other suitable storage device coupled to the bus 408 for storing information and instructions to be executed by the processor 402. The processor 402 and the memory 404 may be supplemented by, or incorporated in, special purpose logic circuitry.

The instructions may be stored in the memory 404 and may be implemented in one or more computer program products, such as one or more modules of computer program instructions encoded on a computer readable medium for execution by the computer system 400 or to control the operation of the computer system 400, and including, but not limited to, computer languages such as data-oriented languages (e.g., SQL, dBase), system languages (e.g., C, extended C object-oriented programming language (object-C), c++, assembly), structural languages (e.g., java, ·net), and application languages (e.g., PHP, ruby, perl, python), according to any methods well known to those skilled in the art. The instructions may also be implemented in the following computer languages: such as array language, aspect oriented language, assembly language, authoring language (authoring language), command line interface language, compiled language, concurrent language, wave bracket language (cury-broadcasting language), data stream language, data structuring language, declarative language, deep language (esoteric language), extension language (extension language), fourth generation language, functional language, interactive mode language, interpreted language, interactive language (iterative language), list-based language (list-based language), small language (littlelanguage), logic-based language, machine language, macro language, meta programming language, multi-paradigm language (multiparadigm language), numerical analysis, non-English-based language (non-englist-based language), class-based object oriented language, prototype-based object oriented language, offside rule language (off-side rule language), procedural language, reflection language (reflective language), rule-based language, script language, stack-based language, synchronous language, grammar processing language (syntax handling language), visual language, and XML-based language. Memory 404 may also be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 402.

Computer programs as discussed herein do not necessarily correspond to files in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network. The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output.

Computer system 400 also includes a data storage device 406, such as a magnetic disk or optical disk, coupled to bus 408 for storing information and instructions. Computer system 400 may be coupled to a variety of devices via input/output module 410. The input/output module 410 may be any input/output module. The exemplary input/output module 410 includes a data port such as a USB port. The input/output module 410 is configured to be connected to the communication module 412. Exemplary communications module 412 includes network interface cards such as an ethernet card and a modem. In certain aspects, the input/output module 410 is configured to connect to a plurality of devices, e.g., the input device 414 and/or the output device 416. Exemplary input devices 414 include a keyboard and a pointing device (e.g., a mouse or trackball) by which a user can provide input to computer system 400. Other kinds of input devices 414, such as tactile input devices, visual input devices, audio input devices, or brain-computer interface devices, may also be used to provide for interaction with a user. For example, feedback provided to the user may be any form of sensory feedback, such as visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form including acoustic input, speech input, tactile input, or brain wave input. Exemplary output devices 416 include a display device, such as a liquid crystal display (liquid crystal display, LCD) monitor, for displaying information to a user.

According to an aspect of the disclosure, the wearable device 100 may be implemented, at least in part, using the computer system 400 in response to the processor 402 executing one or more sequences of one or more instructions contained in the memory 404. Such instructions may be read into memory 404 from another machine-readable medium, such as data storage device 406. Execution of the sequences of instructions contained in main memory 404 causes processor 402 to perform the process steps described herein. One or more processors in a multi-processing arrangement may also be employed to execute the sequences of instructions contained in memory 404. In alternative aspects, hard-wired circuitry may be used in place of or in combination with software instructions to implement various aspects of the present disclosure. Thus, aspects of the disclosure are not limited to any specific combination of hardware circuitry and software.

Aspects of the subject matter described in this specification can be implemented in a computing system that includes a back-end component (e.g., a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a client computer having a graphical user interface or a web browser through which a user can interact with an implementation of the subject matter described in this specification); or aspects of the subject matter described in this specification can be implemented in any combination of one or more such back-end components, one or more such middleware components, or one or more such front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). The communication network (e.g., network 150) may include, for example, any one or more of a LAN, a WAN, the Internet, and the like. Further, the communication network may include, but is not limited to, any one or more of, for example, a bus network, a star network, a ring network, a mesh network, a star bus network, a tree network, or a hierarchical network, among others. The communication module may be, for example, a modem or an ethernet card.

Computing system 400 may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. For example, computer system 400 may be, but is not limited to, a desktop computer, a laptop computer, or a tablet computer. Computer system 400 may also be embedded in another device such as, but not limited to, a mobile phone, a Personal Digital Assistant (PDA), a mobile audio player, a Global Positioning System (GPS) receiver, a video game console, and/or a television set-top box.

The term "machine-readable storage medium" or "computer-readable medium" as used herein refers to any medium or media that participates in providing instructions to processor 402 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks, such as data storage device 406. Volatile media includes dynamic memory, such as memory 404. Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 408. Common forms of machine-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH EPROM, any other memory chip or cartridge, or any other medium from which a computer can read. The machine-readable storage medium may be a machine-readable storage device, a machine-readable storage matrix (machine-readable storage substrate), a memory device, a combination of substances affecting a machine-readable propagated signal, or a combination of one or more of them.

To illustrate the interchangeability of hardware and software, various illustrative blocks, modules, components, methods, operations, instructions, and algorithms have been described generally in terms of their functionality. Whether such functionality is implemented as hardware, software, or a combination of hardware and software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application.

As used herein, the phrase "at least one of" after a series of items, together with the term "and" or "separating any of those items, modifies the list as a whole, rather than modifying each element of the list (e.g., each item). The phrase "at least one of" does not require that at least one item be selected; rather, the phrase is intended to include at least one of any of these items, and/or at least one of any combination of these items, and/or at least one of each of these items. As an example, the phrase "at least one of A, B and C" or "at least one of A, B or C" each refer to: only a, only B or only C; A. any combination of B and C; and/or, at least one of each of A, B and C.

The word "exemplary" is used herein to mean "serving as an example, instance, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments. Such as one aspect, the aspect, another aspect, some aspects, one or more aspects, an implementation, the implementation, another implementation, some implementations, one or more implementations, one embodiment, the embodiment, another embodiment, some embodiments, one or more embodiments, one configuration, the configuration, another configuration, some configurations, one or more configurations, the present user technology, the disclosure, phrases of the present disclosure, and other variations thereof, etc., are for convenience and do not imply that the disclosure associated with such phrases is essential to the present user technology nor that such disclosure applies to all configurations of the present user technology. The disclosure relating to one or more such phrases may apply to all configurations, or one or more configurations. The disclosure relating to one or more such phrases may provide one or more examples. A phrase such as an aspect or aspects may refer to one or more aspects and vice versa, and the same applies to the other phrases previously described.

Reference to an element in the singular is not intended to mean "one and only one" unless specifically so stated, but rather "one or more. Positive pronouns (e.g., his) include negative and neutral sexes (e.g., her and its) and vice versa. The term "some" refers to one or more. The underlined and/or italicized headings and subheadings are used for convenience only and are not limiting of the present user technology nor are they meant to be relevant to the interpretation of the description of the present user technology. Relational terms such as first and second, and the like may be used to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. All structural and functional equivalents to the elements of the various configurations described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the user technology. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the above description. The elements of any claim should be construed as specified in 35u.s.c. ≡112 paragraph 6 unless the element is explicitly recited using the phrase "means for … …" or, in the case of method claims, the phrase "step for … …".

While this specification contains many specifics, these should not be construed as limitations on the scope of what may be described, but rather as descriptions of specific embodiments of user subject matter. Some features described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a described combination can in some cases be excised from the combination, and the described combination may be directed to a subcombination or variation of a subcombination.

The user subject matter of the present specification has been described in terms of particular aspects, but other aspects can be implemented and fall within the scope of the following claims. For example, although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. The actions recited in the claims can be performed in a different order and still achieve desirable results. As one example, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some cases, multitasking parallel processing may be advantageous. Moreover, the separation of various system components in the various aspects described above should not be understood as requiring such separation in all aspects, but rather, it should be understood that the described program components and systems can be generally integrated together in one software product or packaged into multiple software products.

The title, background, figures, and description are incorporated herein by reference and are provided as illustrative examples of the present disclosure and not as limiting descriptions. It should be construed that they are not intended to limit the scope or meaning of the claims. Furthermore, it can be seen in the detailed description that this specification provides illustrative examples, and various features are combined together in various embodiments for the purpose of simplifying the disclosure. The methods of the present disclosure should not be construed as reflecting the following intent: the described user subject matter requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed configuration or operation. The claims are hereby incorporated into the detailed description, with each claim standing on its own as a separate descriptive user theme.

The claims are not intended to be limited to the aspects described herein but are to be accorded the full scope consistent with the language claims and encompassing all legal equivalents. Nevertheless, none of the claims are intended to contain user subject matter that fails to meet applicable patent statutes, nor should they be interpreted in this way.

Claims

1. A computer-implemented method, comprising:

receiving a signal from a sensor mounted on a head-mounted device worn by a user, the signal being indicative of a user intent for capturing an image;

identifying the user intent for capturing the image based on a model classifying the signals from the sensors according to the user intent;

selecting a first image acquisition device of the head-mounted device based on technical parameters of the first image acquisition device and the user intent for acquiring the image; and

the image is acquired using the first image acquisition device.

2. The computer-implemented method of claim 1, wherein receiving a signal from a sensor comprises one of: receiving an inertial signal from an inertial motion sensor; receiving sound collection from user voice; receiving a gesture; or to receive an active button press.

3. The computer-implemented method of claim 1 or 2, wherein selecting the first image acquisition device based on technical parameters of the first image acquisition device comprises: the first image acquisition device is selected when the field of view of the first image acquisition device includes a point of interest of the user in the field of view in the head mounted device.

4. A computer-implemented method as in claim 1, 2 or 3, further comprising: the image acquisition device is deactivated when the user intends to be incompatible with the technical parameters of the first image acquisition device in the head mounted device.

5. The computer-implemented method of any of the preceding claims, further comprising: from one or more image acquisition devices in the head-mounted device, an image acquisition device is selected whose technical parameters best match the user's intent.

6. The computer-implemented method of any of the preceding claims, further comprising: based on a second user intent, a second image acquisition device is selected and the first image acquisition device is deactivated.

7. The computer-implemented method of any of the preceding claims, wherein the technical parameter of the first image acquisition device is a field of view, and wherein selecting the first image acquisition device comprises: verifying the field of view includes identifying objects of interest within the user intent.

8. The computer-implemented method of any of the preceding claims, wherein receiving a signal from a sensor mounted on a headset comprises one or more of:

i. Identifying a gesture of the user indicative of an object of interest;

receiving pupil position of the user from an eye-tracking device mounted on the head-mounted device;

identifying the orientation of the head mounted device and selecting the first image acquisition device comprises: a camera is selected whose field of view is directed along the azimuth of the headset.

9. An augmented reality headset, comprising:

the camera comprises a first camera and a second camera, wherein the first camera and the second camera are installed on a frame, and the first camera and the second camera respectively have a first view field and a second view field;

a sensor mounted on the frame;

a memory configured to store a plurality of instructions; and

one or more processors configured to execute the plurality of instructions to cause the augmented reality headset to:

receiving a signal from the mounted sensor, the signal being indicative of a user intent to acquire an image;

Selecting one of the first camera and the second camera based on the first field of view, the second field of view, and the user intent for capturing the image; and

the image is acquired using the selected camera.

10. The augmented reality headset of claim 9, wherein the sensor is an inertial motion unit, and receiving a signal from the sensor comprises: an orientation of the augmented reality headset relative to a fixed coordinate system is identified.

11. The augmented reality headset of claim 9 or 10, wherein to select one of the first camera or the second camera, the one or more processors execute instructions to: the first camera is selected when the field of view of the first camera includes a point of interest of the user in a field of view in the augmented reality headset.

12. The augmented reality headset of claim 9, 10, or 11, wherein the one or more processors execute instructions to: the first camera is deactivated when the second camera is selected based on a second user intent.

13. The augmented reality headset of any one of claims 9 to 12, wherein the sensor provides a signal indicative of a gesture of the user pointing at an object of interest.

14. The augmented reality headset of any one of claims 9 to 13, wherein the sensor is an eye tracking device mounted on the frame, the sensor being configured to provide a signal indicative of a pupil position of the user.

15. The augmented reality headset of any one of claims 9 to 14, wherein the sensor is one or more of:

i. an inertial motion sensor configured to provide a signal indicative of an orientation of the augmented reality headset, and the one or more processors execute instructions for: selecting the first camera when the first field of view is aligned with the orientation of the augmented reality headset;

one of the first camera or the second camera configured to acquire a gesture of the user indicative of an object of interest;

A microphone configured to collect and identify voice commands indicative of the user's intent;

a touch sensitive sensor configured to receive touch commands from the user.