WO2019015261A1

WO2019015261A1 - Devices and methods for determining scene

Info

Publication number: WO2019015261A1
Application number: PCT/CN2017/119739
Authority: WO
Inventors: Xuejun Long; Meiwen Chen; Jian Zhou; Yidan XU
Original assignee: Chengdu Topplusvision Technology Co., Ltd.
Priority date: 2017-07-17
Filing date: 2017-12-29
Publication date: 2019-01-24

Abstract

Devices and methods for determining a representation of a scene implemented on a user terminal are provided. The methods include obtaining image data relating to a scene in which a user terminal locates for each of a plurality of viewpoints. The methods also include determining depth information relating to the image data. The methods also include determining a scene image for the viewpoint based on depth information and the image data. The methods further determining poses of the user terminal based on inertial measurement data relating to the user terminal and the image data. The methods further include determining a representation of the scene based on the scene images and the poses of the user terminal.

Description

DEVICES AND METHODS FOR DETERMINING SCENE

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority of Chinese Application No. 201720862874.7 filed on July 17, 2017 and Chinese Application No. 201720862875.1 filed on July 17, 2017, the entire contents of each of which are hereby incorporated by reference.

TECHNICAL FIELD

The present disclosure generally relates to devices and methods for determining a representation of a scene, and in particular, to devices and methods for determining a three-dimensional representation of a scene.

BACKGROUND

Nowadays, augmented reality (AR) technology attracts more and more attention for its huge market potential. However, a user terminal using existing technology is difficult to generate precise details about a scene and it may slow down the user terminal to use a general processor in the user terminal to process information relating to the scene. Therefore, it is desirable to provide systems, methods, and devices for determine a representation of a scene precisely.

SUMMARY

In an aspect of the present disclosure, a scene determination device is provided. The scene determination device may include a camera set configured to obtain image data relating to a scene in which the scene determination device is located for a plurality of viewpoints. The camera set may include a first camera and a second camera. The scene determination device may also include a storage medium storing a set of instructions. The scene determination device may also include a processor in communication with the storage medium. The processor is configured to perform one or more steps including: for each of a plurality of viewpoints, obtain image data relating to the scene for the viewpoint from the camera set. The image data may include a first image captured by the first camera of the camera set and a second image captured by the second camera of the camera set; determining depth information relating to the image data for the viewpoint based on the first image and the second image; obtaining inertial measurement data relating to the scene determination device for the viewpoint; determining a scene image for the viewpoint based on the depth information and the image data; determine a pose of the scene determination device for the viewpoint based on the inertial measurement data and the image data. The processor may further determine a representation of the scene based on the scene image for each of the plurality of viewpoints and the pose of the scene determination device for each of the plurality of viewpoints.

In some embodiments, the first camera and the second camera are placed apart with each other with a predetermined distance. The predetermined distance may be an average distance between two eyes of human beings.

In some embodiments, the scene determination device may further include a communication interface through which the scene determination device may communicate with a mobile device.

n some embodiments, the communication interface may include a USB plug or a USB receptacle.

In some embodiments, the pose of the scene determination device for the viewpoint may include a position of the scene determination device for the viewpoint and an orientation of the scene determination device for the viewpoint.

In some embodiments, the determining the pose of the scene determination device for the viewpoint may include determining, by the processor, preceding rotation information of the scene determination device and preceding translation information of the scene determination device for each of one or more preceding viewpoints before the viewpoint on a trajectory of the scene determination device based on at least one of the image data corresponding to each of the one or more preceding viewpoints or inertial measurement data corresponding to each of the one or more preceding viewpoints; determining, by the processor, current rotation information of the scene determination device and translation information of the scene determination device for the viewpoint; determining, by the processor, the position of the scene determination device for the viewpoint based on the preceding translation information of the scene determination device and the current translation information of the scene determination device; and determining, by the processor, the orientation of the scene determination device for the viewpoint based on the preceding rotation information of the scene determination device and the current rotation information of the scene determination device.

In some embodiments, the determining the scene image for the viewpoint based on the depth information and the image data may further include determining a gray image of an image selected from the first image, the second image, or a third image obtained from a mobile device; and determining the scene image for the viewpoint based on the depth information and the gray image.

In some embodiments, the processor may also perform a step of removing drift information and noise from the inertial measurement data.

In some embodiments, the scene determination device may further include at least one of an accelerometer or a gyroscope sensor configured to obtain the inertial measurement data.

In some embodiments, the camera set may be pre-calibrated.

In another aspect of the present disclosure, a user terminal is provided. The user terminal may include a scene determination device including a first processor configured to determine a representation of a scene. The first processor may perform one or more steps. The steps may include: for each of a plurality of viewpoints, obtaining image data relating to a scene for the viewpoint from a camera set of the scene determination device by the first processor. The image data includes a first image captured by a first camera of the camera set and a second image captured by a second camera of the camera set. The steps may include: for each of a plurality of viewpoints, determining depth information relating to the image data for the viewpoint based on the first image and the second image by the first processor; obtaining inertial measurement data relating to the user terminal for the viewpoint by the first processor; determining a scene image for the viewpoint based on the depth information and the image data by the first processor; determining a pose of the user terminal for the viewpoint based on the inertial measurement data and the image data by the first processor. The steps may also include determining a representation of the scene based on the scene image for each of the plurality of viewpoints and the pose of the user terminal for each of the plurality of viewpoints by the first processor. The user terminal may further include a mobile device communicate with the scene determination device. The mobile device may include a second processor configured to receive the representation of the scene and the poses of the user terminal for the plurality of viewpoints to render a virtual object in the scene.

In some embodiments, the first processor and the second processor are different.

In some embodiments, the pose of the user terminal for the viewpoint includes a position of the user terminal for the viewpoint and an orientation of the user terminal for the viewpoint.

In some embodiments, the determining the pose of the user terminal for the viewpoint may include determining, by the first processor, preceding rotation information of the user terminal and preceding translation information of the user terminal for each of one or more preceding viewpoints before the viewpoint on a trajectory of the user terminal based on at least one of the image data corresponding to each of the one or more preceding viewpoints or inertial measurement data corresponding to each of the one or more preceding viewpoints; determining, by the first processor, current rotation information of the user terminal and translation information of the user terminal for the viewpoint; determining, by the first processor, the position of the user terminal for the viewpoint based on the preceding translation information of the user terminal and the current translation information of the user terminal; and determining, by the first processor, the orientation of the user terminal for the viewpoint based on the preceding rotation information of the user terminal and the current rotation information of the user terminal.

In some embodiments, the determining the scene image for the viewpoint based on the depth information and the image data may include determining a gray image of an image from the first image and the second image; and determining the scene image for the viewpoint based on the depth information and the gray image.

In yet another aspect of the present disclosure, a method for determining a representation of a scene is provided. The method may include obtaining image data relating to a scene in which a user terminal is located in for the viewpoint from a camera set of a scene determination for each of a plurality of viewpoints. The image data may include a first image captured by a first camera of the camera set and a second image captured by a second camera of the camera set. The method may include determining depth information relating to the image data for each of the plurality of viewpoint based on the first image and the second image. The method may include obtaining inertial measurement data relating to the user terminal for each of the plurality of viewpoint. The method may include determining a scene image for each of the plurality of viewpoints based on the depth information and the image data. The method may also include determining a pose of the user terminal for each of the plurality of viewpoint based on the inertial measurement data and the image data. The method may further include determining a representation of the scene based on the scene image for each of the plurality of viewpoints and the pose of the user terminal for each of the plurality of viewpoints.

Additional features will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following and the accompanying drawings or may be learned by production or operation of the examples. The features of the present disclosure may be realized and attained by practice or use of various aspects of the methodologies, instrumentalities, and combinations set forth in the detailed examples discussed below.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure is further described in terms of exemplary embodiments. These exemplary embodiments are described in detail with reference to the drawings. The drawings are not to scale. These embodiments are non-limiting schematic embodiments, in which like reference numerals represent similar structures throughout the several views of the drawings, and wherein:

FIG. 1 is a schematic diagram illustrating an exemplary scene according to some embodiments of the present disclosure;

FIG. 2 is a schematic diagram illustrating exemplary hardware and/or software components of a user terminal according to some embodiments of the present disclosure;

FIG. 3 is a block diagram illustrating a scene determination device and a mobile device according to some embodiments of the present disclosure;

FIG. 4 is a block diagram illustrating an exemplary first processor according to some embodiments of the present disclosure;

FIG. 5 is a flowchart illustrating an exemplary process for determining a representation of a scene according to some embodiments of the present disclosure;

FIG. 6 is a block diagram illustrating an exemplary pose determination module according to some embodiments of the present disclosure;

FIG. 7 is a flowchart illustrating an exemplary process for determining a pose of the user terminal for a viewpoint according to some embodiments of the present disclosure;

FIG. 8 is a block diagram illustrating an exemplary connection between a scene determination device and a mobile device according to some embodiments of the present disclosure;

FIGS. 9A and 9B illustrate an exemplary scene determination device according to some embodiments of the present disclosure;

FIGS. 10A and 10B are schematic diagrams illustrating a mobile device and a scene determination device that is attached to the mobile device according to some embodiments of the present disclosure; and

FIGS. 11A and 11B illustrate an integrated user terminal according to some embodiments of the present disclosure.

DETAILED DESCRIPTION

The following description is presented to enable any person skilled in the art to make and use the present disclosure and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present disclosure. Thus, the present disclosure is not limited to the embodiments shown but is to be accorded the widest scope consistent with the claims.

The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. As used herein, the singular forms “a, ” “an, ” and “the” may be intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprise, ” “comprises, ” and/or “comprising, ” “include, ” “includes, ” and/or “including, ” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

These and other features, and characteristics of the present disclosure, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, may become more apparent upon consideration of the following description with reference to the accompanying drawings, all of which form a part of this disclosure. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended to limit the scope of the present disclosure. It is understood that the drawings are not to scale.

The flowcharts used in the present disclosure illustrate operations that systems implement according to some embodiments of the present disclosure. It is to be expressly understood, the operations of the flowchart may be implemented not in order. Conversely, the operations may be implemented in inverted order, or simultaneously. Moreover, one or more other operations may be added to the flowcharts. One or more operations may be removed from the flowcharts.

The positioning technology used in the present disclosure may be based on a global positioning system (GPS) , a global navigation satellite system (GLONASS) , a compass navigation system (COMPASS) , a Galileo positioning system, a quasi-zenith satellite system (QZSS) , a wireless fidelity (Wi-Fi) positioning technology, or the like, or any combination thereof. One or more of the above positioning systems may be used interchangeably in the present disclosure.

An aspect of the present disclosure relates to systems and methods for determining a representation of a scene. According to the present disclosure, for each of a plurality of viewpoints, the systems may obtain image data relating to a scene in which a user terminal is located for the viewpoint from a camera set. The image data may include two images captured by two cameras of the camera set, respectively. According to the two images, the systems may determine depth information relating to the image data for each of the plurality of viewpoints. The systems may further determine a scene image (e.g., a three-dimensional image relating to the scene for each of the plurality of viewpoint) based on the depth information and inertial measurement data relating to the user terminal. Then the systems may determine poses of the user terminal (including positions of the user terminal and orientations of the user terminal) for the plurality of viewpoints. At last, the systems may determine a representation of the scene based on the poses and the scene images. Accordingly, the systems and methods improve the accuracy of a representation of a scene.

FIG. 1 is a schematic diagram illustrating a scene 100 according to some embodiments of the present disclosure. In some embodiments, the scene may be a virtual reality. In some embodiments, the scene may be an augmented reality. In some embodiments, the scene may be a mixed reality. For illustration purposes, the present disclosure may take an augmented reality as an example. The scene 100 may include real objects, for example, a table 120-1, a cup 120-2, walls in the space. In some embodiments, a user terminal 110 may obtain images relating to image of the scene 100 for a plurality of viewpoints when the user terminal 110 is moving. The data relating to the real objects in the images may be used to determine the poses of the user terminal 110 (e.g., one or more cameras of the user terminal 110) with respect to the scene. The poses may then be used to adjust the appearance of virtual objects to correspond with any changes occurring in the poses of the user terminal 110. In some embodiments, the user terminal 110 may be a device which integrates a scene determination device (e.g., a scene determination device 300 illustrated in FIG. 3) used to determine a representation of the scene. In some embodiments, the user terminal 110 may be a mobile device with the scene determination device attached to.

In some embodiments, the user terminal 110 may include a mobile device, a tablet computer, a laptop computer, a built-in device in a motor vehicle, or the like, or any combination thereof. In some embodiments, the mobile device may include a smart home device, a wearable device, a communication device, a virtual reality device, an augmented reality device, or the like, or any combination thereof. In some embodiments, the smart home device may include a smart lighting device, a control device of an intelligent electrical apparatus, a smart monitoring device, a smart television, a smart video camera, an interphone, or the like, or any combination thereof. In some embodiments, the wearable device may include a bracelet, footgear, glasses, a helmet, a watch, clothing, a backpack, a smart accessory, or the like, or any combination thereof. In some embodiments, the communication device may include a mobile phone, a personal digital assistance (PDA) , a gaming device, a navigation device, a point of sale (POS) device, a laptop, a desktop, or the like, or any combination thereof. In some embodiments, the virtual reality device and/or the augmented reality device may include any device that can be used as a virtual reality device or an augmented reality device, for example, a virtual reality helmet, virtual reality glasses, a virtual reality patch, an augmented reality helmet, augmented reality glasses, an augmented reality patch, or the like, or any combination thereof. In some embodiments, the mobile device may be used as a virtual reality device or an augmented reality device. In some embodiments, the mobile device and a scene determination device may be fully integrated as a whole to be used as a user terminal (e.g., an AR device) . In some embodiments, the scene determination device and the mobile device may be affixed to each other, connected to each other, or attached to each other through some interfaces. In some embodiments, the built-in device in the motor vehicle may include an onboard computer, an onboard television, etc. In some embodiments, the user terminal 110 may be a device with positioning technology for locating the position of the user terminal 110.

FIG. 2 is a schematic diagram illustrating exemplary hardware and/or software components of an exemplary user terminal 200 according to some embodiments of the present disclosure. In some embodiments, the user terminal may be used as an augmented reality device (also referred to herein as an AR device) . As illustrated in FIG. 2, the user terminal 200 may include a communication platform 210, a display 220, a graphics processing unit (GPU) 230, a central processing unit (CPU) 240, an input/output (I/O) 250, a memory 260, and a storage 290. In some embodiments, the user terminal 200 may further include one or more cameras 295. The one or more cameras 295 of the user terminal 200 may be used to obtain one or more images of a scene in which the user terminal 200 is located. In some embodiments, the user terminal may further include a specialized processor for generating a representation of the scene. In some embodiments, any other suitable component, including but not limited to a system bus or a controller (not shown) , may also be included in the user terminal 200. In some embodiments, a mobile operating system 270 (e.g., iOS ^TM, Android ^TM, Windows Phone ^TM) and one or more applications 280 may be loaded into the memory 260 from the storage 290 in order to be executed by the CPU 240. The applications 280 may include a browser or any other suitable mobile apps for receiving and rendering information relating to image processing or other information from the processing engine. User interactions with the information stream may be achieved via the I/O 250 and provided to the components of the system via the network.

To implement various modules, units, and their functionalities described in the present disclosure, computer hardware platforms may be used as the hardware platform (s) for one or more of the elements described herein. A computer with user interface elements may be used to implement a personal computer (PC) or any other type of work station or terminal device. A computer may also act as a server if appropriately programmed.

FIG. 3 is a block diagram illustrating a scene determination device 300 and a mobile device 311 according to some embodiments of the present disclosure. The scene determination device 300 may determine a representation of a scene where the scene determination device 300 is located and the poses of the scene determination device 300 for a plurality of viewpoints. The mobile device 311 may obtain the representation of the scene and the poses of the scene determination device 300 to render objects (e.g., a virtual object) in the scene.

In some embodiments, the mobile device 311 and the scene determination device 300 may be fully integrated as a whole to be used as a user terminal (e.g., used as an AR device) . For example, the scene determination device 300 may be as a component, a module, or a unit of the mobile device 311. In some embodiments, the scene determination device 300 and the mobile device 311 may be affixed to each other, connected to each other, or attached to each other through some interfaces to be used as an AR device. For example, the scene determination device 300 may be plugged into the mobile device 311 through some interfaces to communicate with the mobile device 311.

In some embodiments, the scene determination device 300 may include a camera set 310, a first inertial measurement unit 320, a first processor 330, a storage 340, and an interface 350. The camera set 310 may be configured to obtain one or more images of the scene 100. The camera set 310 may include one or more cameras. The one or more cameras may include a digital camera or an analogy camera. The one or more cameras may include a monochromatic camera (e.g., a gray camera) or a color camera (e.g., a RGB camera) . The camera of the camera set 310 may be in any size and/or any shape. In some embodiments, the camera set 310 may include two cameras, for example, a first camera 310-1 and a second camera 310-2. The first camera 310-1 and the second camera 310-2 may be stereo cameras. One or both of the first camera 310-1 and the second camera 310-2 may have a rolling shutter or a global shutter. The two cameras may be displaced apart from each other by a certain distance, for example, an average distance between two eyes of human beings (e.g., 6.0 centimeters to 6.5 centimeters, 6.5 centimeters to 7.0 centimeters) . In some embodiments, both of the first camera 310-1 and the second camera 310-2 are gray cameras. In some embodiments, one of the first camera 310-1 and the second camera 310-2 is a RGB camera and the other of the first camera 310-1 and the second camera 310-2 is a gray camera. In some embodiments, both of the first camera 310-1 and the second camera 310-2 are RGB cameras. For a viewpoint of the camera set 310, the first camera 310-1 and the second camera 310-2 may capture two images of the scene in which the scene determination device 300 is located. The first processor 330 may determine depth information of the scene 100 for the viewpoint based on the two images. In some embodiments, the camera set 310 may be pre-calibrated. The calibration of the camera set 310 may include calibrating one or more intrinsic parameters or one or more extrinsic parameters. The intrinsic parameter may include a focal length of the camera set 310, an optical center, axis skew, or the like, or any combination thereof. The extrinsic parameter may include the rotation and/or translation information in three dimensions between the first camera 310-1 and the second camera 310-2. In some embodiments, the extrinsic parameter may also include the relative position and/or orientation of the camera set 310 to the first inertial measurement unit 320.

The first inertial measurement unit 320 may be configured to determine inertial measurement data relating to the scene determination device 300. The first inertial measurement unit 320 may include one or more sensors. The sensors may include an accelerometer configured to measure one or more accelerations of the scene determination device 300 along one or more directions (e.g., along any of three coordinate axes in a three-dimensional coordinate system, e.g., the Cartesian coordinate system) . The sensors may also include a gyroscopic sensor configured to measure one or more angular accelerations of the scene determination device 300 around one or more axes of the three coordinate axes in the three-dimensional coordinate system, e.g., the Cartesian coordinate system.

The storage 340 may be configured to store information. The information may include images captured by the camera set 310, the depth information relating to the scene 100, the poses of the scene determination device 300, inertial measurement data (including, e.g., angular accelerations, accelerations) , or the like, or any combination thereof. The storage 340 may also include a set of instructions for determining the representation of the scene.

The interface 350 may be configured to connect the scene determination device 300 and the mobile device 311 for the communication therebetween. In some embodiments, the interface 350 may be a communication interface. The communication interface may be a USB plug or a USB receptacle.

In some embodiments, the scene determination device 300 may further include a power management module (not shown in FIG. 3) to provide power for the first processor 330.

In some embodiments, the scene determination device 300 may further include a battery, a battery charging module, and a boost module (not shown in FIG. 3) . The battery may supply power to the power management module through the boost module. The battery may also be charged through the battery charging module and a power interface.

In some embodiments, the mobile device 311 may include a third camera 360, a second inertial measurement unit 370, a second processor 380, a storage 390, and an interface 395. The mobile device 311 may obtain image data including one or more images captured by the third camera 360 for a plurality of viewpoints. The third camera 360 may be a stereo camera. The third camera 360 may have a rolling shutter or a global shutter. The third camera 360 may be a monochromatic camera (e.g., a gray camera) or a color camera (e.g., a RGB camera) . The third camera 360 may be placed at any suitable location of the mobile device 311. For example, the third camera 360 may be a front camera or a rear camera of the mobile device 311. In some embodiments, the third camera 360 may be pre-calibrated.

The second inertial measurement unit 370 may be configured to determine inertial measurement data relating to the mobile device 311. The second inertial measurement unit 370 may include one or more sensors. The sensors may include an accelerometer configured to measure one or more acceleration of the mobile device 311 along one or more directions (e.g., along any of three coordinate axes in a three-dimensional coordinate system, e.g., the Cartesian coordinate system) . The sensors may also include a gyroscopic sensor configured to measure one or more angular accelerations of the mobile device 311 around one or more axes of the three coordinate axes in a three-dimensional coordinate system, e.g., the Cartesian coordinate system.

The second processor 380 may obtain the representation of the scene 100 and the poses of the scene determination device 300 to render a virtual object in the scene 100. The second processor 380 may also process other information obtained by the mobile device 311. The configuration of the second processor may be with reference to the description of the GPU 230 and/or CPU 240 in FIG. 2.

The storage 390 may be configured to store information. The information may include images captured by the third camera 360, the poses of the mobile device 311, inertial measurement data (including angular accelerations, accelerations) relating to the mobile device 311, or the like, or any combination thereof. The configuration of the storage 390 may be with reference to the description of the storage 290 in FIG. 2.

The interface 395 may be configured to connect the scene determination device 300 and the mobile device 311 for the communication therebetween. In some embodiments, the interface 395 may be a communication interface. The communication interface may be a USB plug or a USB receptacle.

In some embodiments, the storage 340 and/or the storage 390 may also store a virtual reality, augmented reality, gesture recognition, face recognition, somatosensory game, a three-dimensional reconstruction application and/or software, or the like, or any combination thereof.

The gyroscopic sensor of the first inertial measurement unit 320 and/or the gyroscopic sensor of the second inertial measurement unit 370 may suppress high-frequency noise in the system and may keep the angular acceleration data stable. However, drifts may also exist in the system. The accelerometer of the first inertial measurement unit 320 and/or the accelerometer of the second inertial measurement unit 370 may suppress low-frequency noise and may remove the drifts from the acceleration data with high-frequency noises. The gyroscopic sensors and accelerometers may benefit from complementary filtering or Kalman filtering algorithms so that drifts of gyroscopic sensor and the noise of accelerometer can be eliminated. The inertial measurement data can be further stabilized.

In some embodiments, the scene determination device 300 and/or the mobile device 311 may include other sensors including, for example, a touch sensor, a gesture sensor, a proximity sensor, a vicinity sensor, an electromagnetic sensor, or the like, or any combination thereof.

It should be noted that the above description about the user terminal is merely with respect to the condition that the scene determination device 300 is attached with the mobile device 311. When the scene determination device 300 is integrated with the mobile device 311 (e.g., as a module of the mobile device 311) , one or more components of the scene determination device 300 may be integrated with the corresponding component of the mobile device 311. For example, the third camera 360 may be removed from the user terminal and the user terminal may only user the first camera 310-1 and the second camera 310-2 to determine a representation of the scene. As another example, the first inertial measurement unit 320 and the second inertial measurement unit 370 may be integrated with each other. The integrated inertial measurement unit may include an accelerometer sensor configured to measure one or more accelerations of the user terminal along one or more directions (e.g., along any of three coordinate axes in a three-dimensional coordinate system, e.g., the Cartesian coordinate system) . The integrated inertial measurement unit may also include a gyroscopic sensor configured to measure one or more angular accelerations of the user terminal around one or more axes of the three coordinate axes in a three-dimensional coordinate system, e.g., the Cartesian coordinate system.

In some embodiments, when the scene determination device 300 is integrated with the mobile device 311 (e.g., as a module of the mobile device 311) , the first processor 330 may be integrated with the second processor 380 as a whole. In some embodiments, when the scene determination device 300 is integrated with the mobile device 311 (e.g., as a module of the mobile device 311) , the first processor 330 may still be independent from (e.g., different from) the second processor 380. The user terminal 110 may use the first processor 330 as a specialized processor to determine a representation of the scene and the poses of the user terminal. Then the user terminal may use the second processor to render a virtual object in the scene based on the representation of the scene and the poses of the user terminal.

FIG. 4 is a block diagram illustrating an exemplary first processor 330 according to some embodiments of the present disclosure. The first processor 330 may include an image data obtaining module 410, a depth information determination module 420, an inertial measurement data obtaining module 430, a scene image determination module 440, a pose determination module 450, and a model determination module 460.

The image data obtaining module 410 may obtain image data relating to a scene for each of a plurality of viewpoints. For each of the plurality of viewpoints, the image data may include a first image and a second image. In some embodiments, the user terminal may record the scene for each point of a continuous moving route of the user terminal (in this situation, it is assumed that the scene determination device 300 and the mobile device 311 are integrated as the user terminal) . When it is assumed that the scene determination device 300 is attached to the mobile device 311 (i.e., the scene determination device 300 and the mobile device 311 are separate devices) , the “moving route of the user terminal” used herein may refer to a moving route of the scene determination device 300. However, it is too massive to document the scene for every moving point for the continuous movement of the user terminal. A plurality of viewpoints may be picked up to act as key viewpoints along the continuous moving route of the user terminal. The first image represents the image relating to the scene that is captured by the first camera 310-1 of the camera set 310. The first camera can be a monochromatic camera or a color camera. In some embodiments, the monochromatic camera has higher frame rate and lower data process load than the color camera. The second image represents the image relating to the scene that is captured by the second camera 310-2 of the camera set 310. In some embodiments, the second camera may be a color camera that takes a color image (e.g., an RGB image) .

The depth information determination module 420 may determine depth information relating to the image data based on the first image and second image. The image data relating to the scene for each of the plurality of viewpoints may include image information and depth information. The image information may include content (e.g., a pixel value, a voxel value) corresponding to each pixel or voxel in an image. The depth information may include a plurality of depth values corresponding to a plurality of points in the scene. Each depth value may indicate a distance from the camera of the user terminal to one of the plurality of points in the scene.

The inertial measurement data obtaining module 430 may obtain inertial measurement data. The inertial measurement data obtaining module 430 may obtain inertial measurement data from the first inertial measurement unit 320 and/or the second inertial measurement unit 370.

The scene image determination module 440 may determine a scene image of the scene for each of the plurality of viewpoints. In some embodiments, the scene image may be a three-dimensional image. Take a specific viewpoint as an example, the scene image determination module 440 may construct a scene image relating to the specific viewpoint according to the depth information relating to the scene corresponding to the specific viewpoint and a monochromatic image (e.g., a gray image) of the scene corresponding to the specific viewpoint. In some embodiments, when the scene determination device 300 is integrated with the mobile device 311 and the mobile device 311 has a color camera, the color image may also be captured by the color camera of the mobile device 311, and the scene image determination module 440 may first determine a gray image based on the color image, then the scene image determination module 440 may determine the scene image of the scene for the viewpoint based on the depth information relating to the scene for the corresponding viewpoint and the gray image for the viewpoint.

The pose determination module 450 may determine poses of the scene determination device 300 for the plurality of viewpoints based on the inertial measurement data and the scene image. As described elsewhere in the present disclosure, when the scene determination device 300 is integrated with the mobile device 311 as the user terminal, the poses of the scene determination device 300 may be the same as the poses of the user terminal 110. The pose determination module 450 may determine poses of the scene determination device 300 for each of the plurality of viewpoints based on the inertial measurement data and gray images. The pose of the scene determination device for a specific viewpoint may include a position of the scene determination device 300 for the specific viewpoint and an orientation of the scene determination device for the specific viewpoint. As described elsewhere in the present disclosure, when the scene determination device 300 is integrated with the mobile device 311 as the user terminal 110, the poses of the scene determination device 300 may be the same as the poses of the user terminal 110.

The model determination module 460 may determine a representation of the scene based on the scene images of the scene for the plurality of viewpoints and the corresponding poses of the scene determination device 300 for the plurality of viewpoints.

FIG. 5 is a flowchart illustrating an exemplary process 500 for determining a representation of a scene according to some embodiments of the present disclosure. The process 500 may be implemented as a set of instructions (e.g., an application) . The first processor 330 and/or the modules illustrated in FIG. 3 may execute the set of instructions, and when executing the instructions, the first processor 330 and/or the modules may be configured to perform the process 500. The operations of the illustrated process presented below are intended to be illustrative. In some embodiments, the process 500 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order in which the operations of the process as illustrated in FIG. 5 and described below is not intended to be limiting.

For illustration purposes, FIG. 5 and the description thereof take the scene determination device 300 which is attached to the mobile device 311 as an example, it should be noted that the scene determination device 300 and the mobile device 311 can be integrated as an integrated device (e.g., the user terminal) , in this situation, the information (e.g., the image data, the depth information, the inertial measurement data, the poses) described below is associated with the integrated device (e.g., the user terminal) .

In 510, the first processor 330 (e.g., the image data obtaining module 410) may obtain image data relating to the scene for each of a plurality of viewpoints. In some embodiments, the image data may include a first image and a second image. In some embodiments, the user terminal may record the scene for each point of a continuous moving route of the user terminal (especially the route of the camera of the user terminal) . However, it is too massive to document the scene for every moving point for the continuous movement of the user terminal. A plurality of viewpoints may be picked up to act as key viewpoints along a continuous moving route of the user terminal. In some embodiments, the first image is captured by the first camera 310-1 of the camera set 310, and the second image is captured by the second camera 310-2 of the camera set 310. In some embodiments, the two cameras may be displaced apart from each other by a certain distance, for example, an average distance between two eyes of human beings (e.g., 6.0 centimeters to 6.5 centimeters, 6.5 centimeters to 7.0 centimeters) . In some embodiments, both of the first camera 310-1 and the second camera 310-2 are gray cameras. In some embodiments, one of the first camera 310-1 and the second camera 310-2 is a RGB camera and the other of the first camera 310-1 and the second camera 310-2 is a gray camera. In some embodiments, both of the first camera 310-1 and the second camera 310-2 are RGB cameras.

In 520, the first processor 330 (e.g., the depth information determination module 420) may determine depth information relating to the image data based on the first image and second image. The depth information may include a plurality of depth values corresponding to a plurality of points in the scene. Each depth value may indicate a distance from the camera of the user terminal (e.g., the camera set 310 of the scene determination device 300) to one of the plurality of points in the scene. In some embodiments, the depth information determination module 420 may determine the depth information based on a triangulation technique.

In 530, the first processor 330 (e.g., the inertial measurement data obtaining module 430) may obtain inertial measurement data. The inertial measurement data obtaining module 430 may obtain the inertial measurement data from the first inertial measurement unit 320 and/or the second inertial measurement unit 370. The inertial measurement units may include one or more gyroscopic sensors and one or more accelerometers. The inertial measurement data obtaining module 430 may obtain accelerations of the user terminal along one or more directions (e.g., along any of three coordinate axes in a three-dimensional coordinate system, e.g., the Cartesian coordinate system) . The inertial measurement data obtaining module 430 may also obtain one or more angular accelerations of the user terminal around one or more axes of the three coordinate axes in a three-dimensional coordinate system, e.g., the Cartesian coordinate system. Particularly, when the scene determination device 300 and the mobile device 311 are attached together, the inertial measurement data obtaining module 430 may obtain accelerations of the scene determination device 300 along one or more directions and one or more angular accelerations of the scene determination device 300 around one or more axes of the three coordinate axes in a three-dimensional coordinate system, e.g., the Cartesian coordinate system. The gyroscopic sensor may suppress high-frequency noise and may keep the angular acceleration data stable. However, the drifts may exist in the system. The accelerometer may suppress low-frequency noise and may remove drifts from the gravitational acceleration data with high-frequency noises. The gyroscopic sensors and accelerometers may benefit from complementary filtering or Kalman filtering algorithms so that the drifts of gyroscopic sensor and the noise of accelerometer can be eliminated. The inertial measurement data can be further stabilized.

In 540, the first processor 330 (e.g., the scene image determination module 440) may determine a scene image of the scene for each of the plurality of viewpoints. In some embodiments, the scene image may be a three-dimensional image. Take a specific viewpoint as an example, the scene image determination module 440 may construct the scene image relating to the specific viewpoint according to the depth information relating to the scene corresponding to the specific viewpoint and a monochromatic image (e.g., a gray image) of the scene corresponding to the specific viewpoint. When the first camera 310-1 is a monochromatic camera, the monochromatic image may be captured by the first camera 310-1 for the viewpoint. When the second camera 310-2 is a monochromatic camera, the monochromatic image may be captured by the second camera 310-2 for the viewpoint. When the first camera 310-1 and the second camera 310-2 are both monochromatic cameras, the monochromatic image may be selected from any one of images captured by the first camera 310-1 and the second camera 310-2 for the viewpoint. When the first camera 310-1 and the second camera 310-2 are both color cameras, the scene image determination module 440 may first determine a gray image from either of two color images captured by the first camera 310-1 and the second camera 310-2 for the viewpoint using a gray image determining technique, then the scene image determination module 440 may determine the scene image of the scene for the viewpoint based on the depth information relating to the scene for the corresponding viewpoint and the gray image for the viewpoint. The gray image determining technique may include floating-point algorithm, integer method, shift method, averaging method, only taking green, or the like, or any combination thereof. In some embodiments, when the scene determination device 300 is integrated with the mobile device 311 and the mobile device 311 has a color camera, the color image may also be captured by the color camera of the mobile device 311, and the scene image determination module 440 may first determine a gray image based on the color image using the gray image determining technique, then the scene image determination module 440 may determine the scene image of the scene for the viewpoint based on the depth information relating to the scene for the corresponding viewpoint and the gray image for the viewpoint.

In 550, the first processor 330 (e.g., the pose determination module 450) may determine poses of the scene determination device for each of the plurality of viewpoints based on the inertial measurement data and gray images. The pose of the scene determination device 300 for a specific viewpoint may include a position of the scene determination device 300 for the specific viewpoint and an orientation of the scene determination device 300 for the specific viewpoint. The determination of the position and the orientation of the scene determination device 300 may be obtained by performing one or more operations described in connection with FIG. 7. In some embodiments, when the scene determination device 300 is integrated with the mobile device 311, the pose of the scene determination device 300 may be the same as the pose of the user terminal. In some embodiments, this method can be achieved by a Visual-Inertial Odometry (VIO) . Those skilled in the art are familiar with specific algorithms, for example, algorithms based on filtering or optimization. According to the movement of the scene determination device 300, a position relationship of the scene determination device 300 relative to the origin can be obtained. Furthermore, position information between the scene determination device 300 and an object (e.g., a virtual object to be rendered) can be acquired according to the position relationship and a position relationship between the object and the origin, wherein the origin is a predetermined origin in a three-dimensional space. In some embodiments, the origin is a position at which the scene determination device 300 starts to move. In some embodiments, the operator can set a certain point in space as the origin. Since the position relationship of the object relative to the origin is predetermined and the position relationship of the scene determination device 300 relative to the origin has been determined, then a position relationship of the object relative to the scene determination device can be determined. The position relationship of the object relative to the scene determination device 300 may be used to render the object.

In 560, the first processor 330 (e.g., the model determination module 460) may determine a representation of the scene based on the scene images of the scene for the plurality of viewpoints and the corresponding poses of the scene determination device 300 for the plurality of viewpoints. The augmented reality effect can be presented by rendering the virtual object based on the pose information of the scene determination device 300. Since the three-dimension image contains depth information, the object may not be displayed when the distance between the virtual object and the scene determination device 300 is greater than the distance between the real object in the scene and the scene determination device 300. Thus, an obstruction effect is achieved so that the augmented reality effect is more realistic.

It should be noted that the above description is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. For example, step 510 and step 530 may be combined as a single step in which the first processor 330 may both obtain the image data relating to the scene and the inertial measurement data associated with user terminal simultaneously. As another example, step 530 may be performed before step 510 and step 520. As still another example, one or more other optional operations (e.g., an execution operation) may be added elsewhere in the exemplary process 500.

FIG. 6 is a block diagram illustrating an exemplary pose determination module according to some embodiments of the present disclosure. The pose determination module 450 may include a rotation information determination unit 610, a translation information determination unit 620, a position determination unit 630, and an orientation determination unit 640.

The rotation information determination unit 610 may determine preceding rotation information of the scene determination device 300 for each of one or more preceding viewpoints with respect to the specific viewpoint. The rotation information determine unit 610 may also determine current rotation information of the scene determination device 300 for the current viewpoint. In some embodiments, the rotation information of the scene determination device 300 may be represented by a matrix (also referred to as a rotation matrix R) .

The translation information determination unit 620 may determine preceding translation information of the scene determination device 300 for each of one or more preceding viewpoints with respect to the specific viewpoint. The translation information determine unit 620 may also determine current translation information of the scene determination device 300 for the current viewpoint. In some embodiments, the translation information of the scene determination device 300 may be represented by a vector (also referred to as a translation vector T) .

The position determination unit 630 may determine a position of the scene determination device 300 based on the preceding translation information and the current translation information. For example, the position determination unit 630 may first integrate the current translation information associated with the current viewpoint and the preceding translation information associated with the viewpoint before the current viewpoint. Then the position determination unit 630 may determine the position of the scene determination device 300 for the current viewpoint based on an original position of the scene determination device 300 and the integrated translation information. The original position of the scene determination device 300 may refer to a starting point on the moving route of the scene determination device 300.

The orientation determination unit 640 may determine an orientation of the scene determination device 300 associated with the current viewpoint based on the preceding rotation information and the current rotation information. For example, the orientation determination unit 640 may first integrate the current rotation information associated with the current viewpoint and the preceding rotation information associated with the viewpoint before the current viewpoint. Then the orientation determination unit 640 may determine the orientation of the scene determination device 300 for the current viewpoint based on an original orientation of the scene determination device 300 and the integrated rotation information. The original orientation of the scene determination device 300 may refer to an orientation of the scene determination device 300 on the original position.

FIG. 7 is a flowchart illustrating an exemplary process 700 for determining a pose of the user terminal for a specific viewpoint according to some embodiments of the present disclosure. The process 700 may be implemented as a set of instructions (e.g., an application) . The first processor 330 and/or the modules illustrated in FIG. 3 may execute the set of instructions, and when executing the instructions, the first processor 330 and/or the modules may be configured to perform the process 700. The operations of the illustrated process presented below are intended to be illustrative. In some embodiments, the process 700 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order in which the operations of the process as illustrated in FIG. 7 and described below is not intended to be limiting.

In 710, the first processor 330 (e.g., the rotation information determination unit 610 of the pose determination module 450) may determine preceding rotation information of the scene determination device 300 and preceding translation information of the scene determination device 300 for each of one or more preceding viewpoints with respect to the specific viewpoint. In some embodiments, the rotation information of the scene determination device 300 may be represented by a matrix (also referred to as a rotation matrix R) and the translation information of the scene determination device 300 may be represented by a vector (also referred to as a translation vector T) . For illustration purposes, a viewpoint for which the rotation information and the translation information is to be determined may be referred to herein as a current viewpoint, and a viewpoint before the current viewpoint on the moving route of the scene determination device 300 may be referred to herein as a preceding viewpoint. For example, v ₁, v ₂, v ₃, …, v _n are viewpoints on the moving route of the scene determination device 300 in accordance with the moving direction of the scene determination device 300, and v _n is the current viewpoint, then the viewpoints v ₁ to v _n-1 are called as preceding viewpoints. To determine a pose associated with the viewpoint v _n, the rotation information determination unit 610 may first determine rotation information associated with the preceding viewpoints v ₁ to v _n-1 (also referred to herein as preceding rotation information) and translation information associated with the preceding viewpoints v ₁ to v _n-1 (also referred to herein as preceding translation information) .

In 720, the first processor 330 (e.g., the translation information determination unit 620 of the pose determination module 450) may determine rotation information of the scene determination device 300 associated with the current viewpoint (also referred to herein as current rotation information) and translation information of the scene determination device 300 associated with the current viewpoint (also referred to herein as current translation information) . In some embodiments, when the scene determination device 300 is integrated with the mobile device 311, the translation information (e.g., the preceding translation information, the current translation information) of the scene determination device 300 may be the same as the translation information (e.g., the preceding translation information, the current translation information) of the user terminal and the rotation information (e.g., the preceding rotation information, the current rotation information) of the scene determination device 300 may be the same as the rotation information (e.g., the preceding rotation information, the current rotation information) of the user terminal.

In some embodiments, the first processor 330 may determine rotation information and translation information of the scene determination device 300 associated with a specific viewpoint based on the image relating to the specific viewpoint and the image relating to the viewpoint nearest before the specific viewpoint. For example, the rotation information determination unit 610 may extract features of objects in the two images and compare the features to determine the rotation information and translation information of the scene determination device 300. In some embodiments, the first processor 330 may determine rotation information and translation information of the scene determination device 300 associated with a specific viewpoint based on the inertial measurement data relating to the specific viewpoint and the inertial measurement data relating to the viewpoint nearest before the specific viewpoint. For example, the first processor 330 may obtain the angular accelerations of the scene determination device 300 for the specific viewpoint with respect to a viewpoint nearest before the specific viewpoint around one or more axes of the three coordinate axes in a three-dimensional coordinate system, e.g., the Cartesian coordinate system, to determine the rotation information of the user terminal associated with the specific viewpoint. The first processor 330 may also obtain the accelerations of the scene determination device 300 for the specific viewpoint with respect to a viewpoint nearest before the specific viewpoint along one or more axes of the three coordinate axes in a three-dimensional coordinate system, e.g., the Cartesian coordinate system, to determine the translation information of the user terminal associated with the specific viewpoint. In some embodiments, the first processor 330 may determine rotation information and translation information of the scene determination device 300 associated with a specific viewpoint based on the image and the inertial measurement data relating to the specific viewpoint and the image and the inertial measurement data relating to a viewpoint nearest before the specific viewpoint.

In 730, the first processor 330 (e.g., the position determination unit 630 of the pose determination module 450) may determine a position of the user terminal associated with the current viewpoint based on the preceding translation information and the current translation information. For example, the position determination unit 630 may first accumulate the current translation information associated with the current viewpoint and the preceding translation information associated with preceding viewpoint (s) before the current viewpoint. Then the position determination unit 630 may determine the position of the user terminal associated with the current viewpoint based on an original position of the scene determination device 300 and the accumulated translation information. The original position of the scene determination device 300 may refer to a starting point on the moving route of the scene determination device 300.

In 740, the first processor 330 (e.g., the orientation determination unit 640 of the pose determination module 450) may determine an orientation of the user terminal associated with the current viewpoint based on the preceding rotation information and the current rotation information. For example, the orientation determination unit 640 may first integrate the current rotation information associated with the current viewpoint and the preceding rotation information associated with preceding viewpoint (s) before the current viewpoint. Then the orientation determination unit 640 may determine the orientation of the user terminal associated with the current viewpoint based on an original orientation of the scene determination device 300 and the integrated rotation information. The original orientation of the user terminal may refer to an orientation of the scene determination device 300 on the original position. The orientation of the user terminal may be represented by Euler angles of the user terminal along the axes of the three coordinate axes in a three-dimensional coordinate system, e.g., the Cartesian coordinate system.

It should be noted that the above description is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations or modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. For example, the order of step 730 and step 740 may be changed. As another example, one or more other optional operations (e.g., a storing operation) may be added elsewhere in the exemplary process 700.

FIG. 8 is a schematic diagram illustrating an exemplary connection between a scene determination device 300 and a mobile device 311 according to some embodiments of the present disclosure. In some embodiments, the mobile device 311 may include a smart home device, a wearable device, a smart communication device, a virtual reality device, an augmented reality device, or the like, or any combination thereof. In some embodiments, the smart home device may include a smart lighting device, a control device of an intelligent electrical apparatus, a smart monitoring device, a smart television, a smart video camera, an interphone, or the like, or any combination thereof. In some embodiments, the wearable device may include a smart bracelet, a smart footgear, smart glasses, a smart helmet, a smartwatch, smart clothing, a smart backpack, a smart accessory, or the like, or any combination thereof. In some embodiments, the smart communication device may include a smartphone, a personal digital assistant (PDA) , a gaming device, a navigation device, a point of sale (POS) device, or the like, or any combination thereof. In some embodiments, the virtual reality device and/or the augmented reality device may include a virtual reality helmet, a virtual reality glass, a virtual reality patch, an augmented reality helmet, augmented reality glasses, an augmented reality patch, or the like, or any combination thereof. In some embodiments, the mobile device 311 and the scene determination device 300 may be fully integrated as a whole to be used as a user terminal (e.g., an AR device) . In some embodiments, the scene determination device 300 and the mobile device 311 may be affixed to each other, connected to each other, or attached to each other through some interfaces.

FIGS. 9A and 9B illustrate an exemplary scene determination device 300 which can be attached to the mobile device according to some embodiments of the present disclosure. As described in connection with FIG. 3, the first camera 310-1 and the second camera 310-2 are displaced apart from each other by a certain distance, for example, an average distance between two eyes of human beings (e.g., 6.0 centimeters to 6.5 centimeters, 6.5 centimeters to 7.0 centimeters) . As illustrated in FIGS. 9A and 9B, the first camera 310-1 and the second camera 310-2 are placed on two sides of the scene determination device 300 respectively.

The scene determination device 300 may also include a communication interface (not shown in FIGS. 9A or 9B) to connect with the mobile device 311. The communication interface connecting the scene determination device 910 and the mobile device may be any type of wired or wireless connection. The wireless connection may include a Local Area Network (LAN) , a Wide Area Network (WAN) , a Bluetooth ^TM, a ZigBee ^TM, a Near Field Communication (NFC) , or the like, or any combination thereof. The wired connection may include a micro USB interface, Mini USB interface, 8-Pin USB interface, 10-Pin USB interface, 30-Pin USB interface, Type-C USB interface, or other specifications of the USB interface. Furthermore, the USB interface may use a USB plug 910 or a USB receptacle 930 based on different requirements. As illustrated in FIG. 9A, the scene determination device 300 may connect the USB receptacle of the external mobile device directly by using the USB plug 910, which may be used to fix the scene determination device 300 on the mobile device. The mobile device may receive data processing results of mobile software and/or application from the scene determination device 300 according to the USB plug 910.

In some embodiments, as illustrated in FIG. 9B, the scene determination device 300 may connect a USB plug of the external mobile device directly by using the USB receptacle 930. A USB data cable may be applied to a user terminal without a USB plug. In some embodiments, the scene determination device 300 is employed in an Unmanned Aerial Vehicle (UAV) that may enhance the visual ability to the UAV.

In some embodiments, the scene determination device 300 may further include a power interface 920 (shown in FIG. 9B) through which the built-in battery in the scene determination device 300 may be charged. The distribution of the power interface 920 and the communication interface may be arbitrary. For example, the power interface 920 and the communication interface (e.g., the USB plug 910 or the USB receptacle 930) may be on a same surface of the scene determination device 300 or on different surfaces of the scene determination device 910. For instance, the power interface 920 may be the top surface of the scene determination device 300 and the communication interface (e.g., the USB plug 910 or the USB receptacle 930) may be the right surface, left surface, or bottom surface of the scene determination device 300. It should be noted that the term “bottom, ” “top, ” “left, ” or “right” are provided for describing the distribution of the power interface and the communication interface, and not intended to be limiting.

FIGS. 10A and 10B are schematic diagrams illustrating a mobile device and a scene determination device that is attached to the mobile device according to some embodiments of the present disclosure. As described in connection with FIG. 3, the scene determination device 300 and the mobile device 311 may be two separate devices. As illustrated in FIGS. 10A and 10B, the scene determination device 300 may be attached to the mobile device 311 through a communication interface 1020 of the scene determination device 300 and a communication interface 1010 of the mobile device 311. In some embodiments, the camera set 310 of the scene determination device 300 may work as a front camera shown in FIG. 10A. In some embodiments, the camera set 310 of the scene determination device 1010 may work as a rear camera shown in FIG. 10B. In FIGS. 10A and 10B, the communication interface 1020 of the scene determination device 300 is a USB plug and the communication interface 1010 of the mobile device 311 is a USB receptacle. It should be not that it is merely an example, and is not intended to be limiting. In some embodiments, the communication interface 1020 of the scene determination device 300 is a USB receptacle and the communication interface 1010 of the mobile device 311 is a USB plug.

FIGS. 11A and 11B is a schematic diagram illustrating a user terminal that is an integration of a scene determination device and a mobile device according to some embodiments of the present disclosure. As described in connection with FIG. 3, the scene determination device 300 and the mobile device 311 may be integrated as a single device (i.e., the user terminal 110) . Accordingly, as illustrated in FIGS. 11A and 11B, the first camera 310-1 and the second camera 310-2 are both integrated in the user terminal 110, and the first camera 310-1 and the second camera 310-2 are displaced apart from each other by a certain distance, for example, an average distance between two eyes of human beings (e.g., 6.0 centimeters to 6.5 centimeters, 6.5 centimeters to 7.0 centimeters) .

In some embodiments, the camera set may work as a front camera shown in FIG. 11A. In some embodiments, the camera set may work as a rear camera shown in FIG. 11B.

Having thus described the basic concepts, it may be rather apparent to those skilled in the art after reading this detailed disclosure that the foregoing detailed disclosure is intended to be presented by way of example only and is not limiting. Various alterations, improvements, and modifications may occur and are intended to those skilled in the art, though not expressly stated herein. These alterations, improvements, and modifications are intended to be suggested by this disclosure and are within the spirit and scope of the exemplary embodiments of this disclosure.

Moreover, certain terminology has been used to describe embodiments of the present disclosure. For example, the terms “one embodiment, ” “an embodiment, ” and/or “some embodiments” mean that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Therefore, it is emphasized and should be appreciated that two or more references to “an embodiment” or “one embodiment” or “an alternative embodiment” in various portions of this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the present disclosure.

Further, it will be appreciated by one skilled in the art, aspects of the present disclosure may be illustrated and described herein in any of a number of patentable classes or context including any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof. Accordingly, aspects of the present disclosure may be implemented entirely hardware, entirely software (including firmware, resident software, micro-code, etc. ) or combining software and hardware implementation that may all generally be referred to herein as a “unit, ” “module, ” or “system. ” Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer-readable media having computer readable program code embodied thereon.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including electromagnetic, optical, or the like, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that may communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable signal medium may be transmitted using any appropriate medium, including wireless, wireline, optical fiber cable, RF, or the like, or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object-oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, C#, VB. NET, Python or the like, conventional procedural programming languages, such as the "C"programming language, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, dynamic programming languages such as Python, Ruby, and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN) , or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider) or in a cloud computing environment or offered as a service such as a Software as a Service (SaaS) .

Furthermore, the recited order of processing elements or sequences, or the use of numbers, letters, or other designations, therefore, is not intended to limit the claimed processes and methods to any order except as may be specified in the claims. Although the above disclosure discusses various examples what is currently considered to be a variety of useful embodiments of the disclosure, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover modifications and equivalent arrangements that are within the spirit and scope of the disclosed embodiments. For example, although the implementation of various components described above may be embodied in a hardware device, it may also be implemented as a software-only solution, e.g., an installation on an existing server or user terminal.

Similarly, it should be appreciated that in the foregoing description of embodiments of the present disclosure, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the various embodiments. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed subject matter requires more features than are expressly recited in each claim. Rather, claimed subject matter may lie in less than all features of a single foregoing disclosed embodiment.

Claims

A scene determination device, comprising:

a camera set configured to obtain image data relating to a scene in which the scene determination device is located for a plurality of viewpoints, the camera set including a first camera and a second camera;

a storage medium storing a set of instructions; and

a processor in communication with the storage medium, wherein when executing the set of instructions, the processor is configured to perform one or more steps including:

for each of a plurality of viewpoints,

obtaining, from the camera set, image data relating to the scene for the viewpoint, wherein the image data include a first image captured by the first camera of the camera set and a second image captured by the second camera of the camera set;

determining depth information relating to the image data for the viewpoint based on the first image and the second image;

obtaining inertial measurement data relating to the scene determination device for the viewpoint;

determining a scene image for the viewpoint based on the depth information and the image data;

determining a pose of the scene determination device for the viewpoint based on the inertial measurement data and the image data; and

determining a representation of the scene based on the scene image for each of the plurality of viewpoints and the pose of the scene determination device for each of the plurality of viewpoints.
The scene determination device of claim 1, wherein the first camera and the second camera are placed apart with each other with a predetermined distance, the predetermined distance being an average distance between two eyes of human beings.
The scene determination device of claim 1, wherein the scene determination device further includes a communication interface through which the scene determination device communicates with a mobile device.
The scene determination device of claim 3, wherein the communication interface includes a USB plug or a USB receptacle.
The scene determination device of claim 1, wherein the pose of the scene determination device for the viewpoint includes a position of the scene determination device for the viewpoint and an orientation of the scene determination device for the viewpoint.
The scene determination device of claim 5, wherein the determining the pose of the scene determination device for the viewpoint includes:

determining, by the processor, preceding rotation information of the scene determination device and preceding translation information of the scene determination device for each of one or more preceding viewpoints before the viewpoint on a trajectory of the scene determination device based on at least one of the image data corresponding to each of the one or more preceding viewpoints or inertial measurement data corresponding to each of the one or more preceding viewpoints;

determining, by the processor, current rotation information of the scene determination device and translation information of the scene determination device for the viewpoint;

determining, by the processor, the position of the scene determination device for the viewpoint based on the preceding translation information of the scene determination device and the current translation information of the scene determination device; and

determining, by the processor, the orientation of the scene determination device for the viewpoint based on the preceding rotation information of the scene determination device and the current rotation information of the scene determination device.
The scene determination device of claim 1, wherein the determining the scene image for the viewpoint based on the depth information and the image data further includes

determining, by the processor a gray image of an image selected from the first image, the second image, or a third image obtained from a mobile device; and

determining, by the processor, the scene image for the viewpoint based on the depth information and the gray image.
The scene determination device of claim 1, wherein the processor also performs a step of:

removing drift information and noise from the inertial measurement data.
The scene determination device of claim 1, wherein the scene determination device further includes at least one of an accelerometer or a gyroscope sensor configured to obtain the inertial measurement data.
The scene determination device of claim 1, wherein the camera set is pre-calibrated.
A user terminal, comprising:

a scene determination device including a first processor configured to determine a representation of a scene, by performing one or more steps including:

for each of a plurality of viewpoints,

obtaining, by the first processor, from a camera set of the scene determination device, image data relating to a scene for the viewpoint, wherein the image data includes a first image captured by a first camera of the camera set and a second image captured by a second camera of the camera set;

determining, by the first processor, depth information relating to the image data for the viewpoint based on the first image and the second image;

obtaining, by the first processor, inertial measurement data relating to the user terminal for the viewpoint from an inertial measurement unit of the scene determination device;

determining, by the first processor, a scene image for the viewpoint based on the depth information and the image data;

determining, by the first processor, a pose of the user terminal for the viewpoint based on the inertial measurement data and the image data; and

determining, by the first processor, a representation of the scene based on the scene image for each of the plurality of viewpoints and the pose of the user terminal for each of the plurality of viewpoints; and

a mobile device configured to communicate with the scene determination device, the mobile device including a second processor configured to receive the representation of the scene and the poses of the user terminal for the plurality of viewpoints to render a virtual object in the scene.
The user terminal of claim 11, wherein the user terminal further includes at least one accelerometer or a gyroscope sensor configured to obtain the inertial measurement data.
The user terminal of claim 11, wherein the first camera and the second camera are placed apart with each other with a predetermined distance, the predetermined distance being an average distance between two eyes of human beings.
The user terminal of claim 11, wherein the first processor and the second processor are different.
The user terminal of claim 11, wherein the pose of the user terminal for the viewpoint includes a position of the user terminal for the viewpoint and an orientation of the user terminal for the viewpoint.
The user terminal of claim 15, wherein the determining the pose of the user terminal for the viewpoint further includes:

determining, by the first processor, preceding rotation information of the user terminal and preceding translation information of the user terminal for each of one or more preceding viewpoints before the viewpoint on a trajectory of the user terminal based on at least one of the image data corresponding to each of the one or more preceding viewpoints or inertial measurement data corresponding to each of the one or more preceding viewpoints;

determining, by the first processor, current rotation information of the user terminal and translation information of the user terminal for the viewpoint;

determining, by the first processor, the position of the user terminal for the viewpoint based on the preceding translation information of the user terminal and the current translation information of the user terminal; and

determining, by the first processor, the orientation of the user terminal for the viewpoint based on the preceding rotation information of the user terminal and the current rotation information of the user terminal.
The user terminal of claim 11, wherein the determining the scene image for the viewpoint based on the depth information and the image data includes:

determining, by the first processor, a gray image of the third image; and

determining, by the first processor, the scene image for the viewpoint based on the depth information and the gray image.
The user terminal of claim 11, wherein the first processor also performs a step of:

removing drift information and noise from the inertial measurement data.
The user terminal of claim 11, wherein the camera set is pre-calibrated.
A method for determining a representation of a scene, comprising:

for each of a plurality of viewpoints,

obtaining, from a camera set of a scene determination device, image data relating to a scene in which a user terminal is located in for the viewpoint, wherein the image data includes a first image captured by a first camera of the camera set and a second image captured by a second camera of the camera set;

determining depth information relating to the image data for the viewpoint based on the first image and the second image;

obtaining inertial measurement data relating to the user terminal for the viewpoint;

determining a scene image for the viewpoint based on the depth information and the image data;

determining a pose of the user terminal for the viewpoint based on the inertial measurement data and the image data; and

determining a representation of the scene based on the scene image for each of the plurality of viewpoints and the pose of the user terminal for each of the plurality of viewpoints.