CN116342838A

CN116342838A - Headset and map creation initialization method in headset

Info

Publication number: CN116342838A
Application number: CN202111591765.3A
Authority: CN
Inventors: 曾杰
Original assignee: Hisense Electronic Technology Shenzhen Co ltd
Current assignee: Hisense Electronic Technology Shenzhen Co ltd
Priority date: 2021-12-23
Filing date: 2021-12-23
Publication date: 2023-06-27

Abstract

The application provides a head-mounted device and a map-building initialization method in the head-mounted device, wherein in the map-building initialization process of the head-mounted device, the pose estimation of a current frame image can be completed through tracking matching between the current frame image and a previous frame image, and a point cloud of the current frame image is generated based on binocular stereo matching and front-back frame tracking matching. And when the quantity of all the frame images meets the requirement of the initialization quantity, an initial map is established by utilizing the pose and the point cloud of all the frame images together in an optimization mode, so that the map establishment initialization process is completed. The method simultaneously gives consideration to the initialization speed and precision of the drawing, and ensures the stability of the subsequent drawing and the experience of the user in using the head-mounted equipment.

Description

Headset and map creation initialization method in headset

Technical Field

The application relates to the technical field of virtual reality and augmented reality, in particular to a head-mounted device and a map initialization method in the head-mounted device.

Background

Virtual Reality (VR) technology is a display technology that simulates a Virtual environment by a computer, thereby giving an environmental immersion. The augmented reality (Augmented Reality, AR) technology is a display technology that simulates physical information such as visual, sound, taste, and touch, which are hardly experienced in a certain time-space range of the real world, through a scientific technology such as a computer, and applies the simulated information to the real world. VR devices are devices that employ virtual reality technology to present virtual pictures to a user. The AR device is a device that applies the augmented reality technology to present a virtual-real combined picture to the user. Indeed, VR devices and AR devices may be collectively referred to as headsets.

SLAM (Simultaneous Localization and Mapping, synchronous localization and mapping) is a fundamental algorithm in VR technology and AR technology, providing self localization and environmental mapping for head-mounted devices, and is a bridge connecting the real world and the virtual world. The starting speed and accuracy of SLAM directly determine the user experience of the headset, and at present, two methods for initializing the map of SLAM are available: the method has the advantages that the starting speed is pursued, the engineering requirement is emphasized, the starting can be completed at extremely high speed under normal conditions, but the method has low initialization precision under abnormal conditions such as illumination change, low texture and the like, and the stability of a subsequent graph is easily influenced. In addition, the feature is strictly selected in the initialization process, and repeated initialization operation is performed, but the initialization time is long, so that the use experience of a user is affected.

Therefore, in the map creation initialization process of the head-mounted device, if the initialization speed is to be ensured, the initialization precision is reduced; and if the initialization accuracy is to be ensured, the initialization speed is reduced. The initialization speed and the precision cannot be simultaneously considered, and the stability of the subsequent drawings and the experience of the user in using the head-mounted device are difficult to ensure.

Disclosure of Invention

The utility model provides a wear-resistant equipment and a method for initializing the drawing in the wear-resistant equipment, which can simultaneously consider the drawing initialization speed and the drawing initialization precision of the wear-resistant equipment, and ensure the stability of the subsequent drawing and the experience of the user using the wear-resistant equipment.

In a first aspect, the present application provides a headset comprising: a camera configured to capture an image of a real world; an inertial measurement unit configured to acquire an inertial initial pose of each frame of image of the real world; a controller configured to: tracking 2D characteristic points in a previous frame image in a current frame image to obtain front and back tracking 2D points; PNP calculation is carried out by utilizing the 2D points which are successfully triangulated in the previous frame image in the front and back tracking 2D points, so that the inertial initial pose of the current frame image is optimized, and the inertial optimized pose of the current frame image is obtained; extracting 2D characteristic points from the current frame image according to a preset extraction requirement, and performing binocular stereo matching and triangularization treatment to obtain binocular matching 3D points after triangularization is successful; performing triangulation processing on the front and back tracking 2D points to obtain front and back tracking 3D points after the triangulation is successful; when the number of binocular matching 3D points is larger than a preset threshold value, determining whether the number of all frame images is larger than or equal to a first preset number; and when the number of all the frame images is greater than or equal to the first preset number, performing BA optimization by combining the inertial optimization pose, the binocular matching 3D points and the front and back tracking 3D points of all the frame images to obtain an initial map, thereby completing the map building initialization process.

The head-mounted device is worn on the head of the user, and a scene image of the real world is continuously acquired according to the rotation direction of the head of the user and a map is built on the head-mounted device, so that association between the real world and the virtual world is completed. In the map building process, the map building initialization process is particularly important, and in the map building initialization process of the head-mounted equipment, the pose estimation of the current frame image can be completed through tracking matching between the current frame image and the previous frame image, and the point cloud of the current frame image is generated based on binocular stereo matching and front-back frame tracking matching. And when the quantity of all the frame images meets the requirement of the initialization quantity, an initial map is established by utilizing the pose and the point cloud of all the frame images together in an optimization mode, so that the map establishment initialization process is completed. The method simultaneously gives consideration to the initialization speed and precision of the drawing, and ensures the stability of the subsequent drawing and the experience of the user in using the head-mounted equipment.

In some implementations, the current frame image includes a current left image and a current right image of a current time taken by a left-and-right eye camera, respectively; the controller is further configured to: respectively extracting 2D characteristic points of a preset number from the current left image and the current right image of the current frame image according to preset extraction requirements; binocular stereo matching is carried out on the 2D characteristic points in the current left image and the current right image, and binocular matching 2D points are obtained; and carrying out triangulation processing on the binocular matching 2D points to obtain binocular matching 3D points after successful triangulation.

In some implementations, the current frame image includes a current left image and a current right image of a current time taken by the left and right eye cameras, respectively, and the previous frame image includes a historical left image and a historical right image of a previous time taken by the left and right eye cameras, respectively; the controller is further configured to: and performing front-back frame tracking matching by using the 2D characteristic points in the current left image and the 2D characteristic points in the historical left image, or performing front-back frame tracking matching by using the 2D characteristic points in the current right image and the 2D characteristic points in the historical right image, so as to obtain front-back tracking 2D points.

In some implementations, the controller is further configured to: when the number of the binocular matching 3D points is smaller than or equal to a preset threshold value, clearing the current frame image and all 2D points and 3D points obtained according to the current frame image; changing the preset extraction requirement, and extracting more 2D feature points in the next frame image than the current frame image.

In some implementations, the controller is further configured to: when the number of all the frame images is smaller than a first preset number, reserving the currently processed frame images and all the 2D points and the 3D points obtained according to the frame images; and recovering the preset extraction requirements.

In some implementations, the controller is further configured to: when the number of all the frame images is greater than or equal to the first preset number, performing inertial bias optimization on each frame image by using the inertial optimization pose of each frame image, and updating the inertial state variable of each frame image; and (3) performing BA optimization by using the updated inertial state variable of each frame of image, the binocular matching 3D point in each frame of image and the front and back tracking 3D point to obtain an initial map, thereby completing the map building initialization process.

In a second aspect, the present application further provides a mapping initialization method in a headset device, including: tracking 2D characteristic points in a previous frame image in a current frame image to obtain front and back tracking 2D points; PNP calculation is carried out by utilizing the 2D points which are successfully triangulated in the previous frame image in the front and back tracking 2D points, and the inertial initial pose of the current frame image is optimized to obtain the inertial optimized pose of the current frame image; extracting 2D characteristic points from the current frame image according to a preset extraction requirement, and performing binocular stereo matching and triangularization treatment to obtain binocular matching 3D points after triangularization is successful; performing triangulation processing on the front and back tracking 2D points to obtain front and back tracking 3D points after the triangulation is successful; when the number of the binocular matching 3D points is larger than a preset threshold value, determining whether the number of all frame images is larger than or equal to a first preset number; and when the number of all the frame images is greater than or equal to the first preset number, performing BA optimization by combining the inertial optimization pose, the binocular matching 3D points and the front and back tracking 3D points of all the frame images to obtain an initial map, thereby completing the map building initialization process.

In some implementations, the current frame image includes a current left image and a current right image of a current time taken by a left-and-right eye camera, respectively; the method further comprises the steps of: respectively extracting 2D characteristic points of a preset number from the current left image and the current right image of the current frame image according to preset extraction requirements; binocular stereo matching is carried out on the 2D characteristic points in the current left image and the current right image, and binocular matching 2D points are obtained; and carrying out triangulation processing on the binocular matching 2D points to obtain binocular matching 3D points after successful triangulation.

In some implementations, the current frame image includes a current left image and a current right image of a current time taken by the left and right eye cameras, respectively, and the previous frame image includes a historical left image and a historical right image of a previous time taken by the left and right eye cameras, respectively; the method further comprises the steps of: and performing front-back frame tracking matching by using the 2D characteristic points in the current left image and the 2D characteristic points in the historical left image, or performing front-back frame tracking matching by using the 2D characteristic points in the current right image and the 2D characteristic points in the historical right image, so as to obtain front-back tracking 2D points.

In some implementations, the method further comprises: when the number of the binocular matching 3D points is smaller than or equal to a preset threshold value, clearing the current frame image and all 2D points and 3D points obtained according to the current frame image; changing the preset extraction requirement, and extracting more 2D feature points in the next frame image than the current frame image.

The mapping initialization method in the headset according to the second aspect of the present application may be applied to the headset according to the first aspect and specifically implemented by the controller in the headset, so that the beneficial effects of the mapping initialization method in the headset according to the second aspect are the same as those of the headset according to the first aspect, and are not described herein again.

Drawings

In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 illustrates a display system architecture diagram including a virtual reality device, according to some embodiments;

FIG. 2 illustrates a VR scene global interface schematic in accordance with some embodiments;

FIG. 3 illustrates a recommended content region schematic diagram of a global interface, according to some embodiments;

FIG. 4 illustrates an application shortcut entry area schematic for a global interface in accordance with some embodiments;

FIG. 5 illustrates a suspension diagram of a global interface, according to some embodiments;

FIG. 6 illustrates a first flow diagram of a mapping initialization process in a headset device, in accordance with some embodiments;

FIG. 7 illustrates a schematic diagram of a front-to-back frame tracking match, in accordance with some embodiments;

FIG. 8 illustrates a schematic diagram of binocular stereo matching according to some embodiments;

FIG. 9 illustrates a flow diagram for obtaining binocular matching 3D points in a headset according to some embodiments;

FIG. 10 illustrates a second flow diagram of a build-in-head device initialization process, in accordance with some embodiments;

FIG. 11 illustrates a flowchart of optimizing a frame image in a headset device, in accordance with some embodiments;

fig. 12 illustrates a third flow diagram of a map initialization process in a head-mounted device, in accordance with some embodiments.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the exemplary embodiments of the present application more apparent, the technical solutions in the exemplary embodiments of the present application will be clearly and completely described below with reference to the drawings in the exemplary embodiments of the present application, and it is apparent that the described exemplary embodiments are only some embodiments of the present application, but not all embodiments.

All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present application, are intended to be within the scope of the present application based on the exemplary embodiments shown in the present application. Furthermore, while the disclosure has been presented in terms of an exemplary embodiment or embodiments, it should be understood that various aspects of the disclosure can be practiced separately from the disclosure in a complete subject matter.

It should be understood that the terms "first," "second," "third," and the like in the description and in the claims and in the above-described figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate, such as where appropriate, for example, implementations other than those illustrated or described in accordance with embodiments of the present application.

Furthermore, the terms "comprise" and "have," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to those elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

The term "module" as used in this application refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the function associated with that element.

Reference throughout this specification to "multiple embodiments," "some embodiments," "one embodiment," or "an embodiment," etc., means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, appearances of the phrases "in various embodiments," "in some embodiments," "in at least one other embodiment," or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Thus, a particular feature, structure, or characteristic shown or described in connection with one embodiment may be combined, in whole or in part, with features, structures, or characteristics of one or more other embodiments without limitation. Such modifications and variations are intended to be included within the scope of the present application.

In the present embodiment, the headset 500 generally refers to a display device that can be worn on the face of a user to provide an immersive experience for the user, including, but not limited to, virtual reality devices such as VR glasses, augmented reality devices (Augmented Reality, AR), VR gaming devices, mobile computing devices, and other wearable computers. In some embodiments of the present application, VR glasses are taken as an example to describe a technical solution, and it should be understood that the provided technical solution may be applied to other types of head-mounted devices at the same time. The head-mounted device 500 may operate independently or be used as an external device to access other intelligent display devices, where the display devices may be an intelligent television, a computer, a tablet computer, a server, and the like.

The headset 500 may display a media asset screen after being worn on the face of the user to provide close-up images for both eyes of the user to bring an immersive experience. To present the asset screen, the headset 500 may include a number of components for displaying the screen and face wear. Taking VR glasses as an example, the headset 500 may include components such as a housing, a position fixture, an optical system, a display assembly, a gesture detection circuit, an interface circuit, and the like. In practical applications, the optical system, the display assembly, the gesture detection circuit and the interface circuit may be disposed in the housing, so as to be used for presenting a specific display screen; the two sides of the shell are connected with position fixing pieces so as to be worn on the face of a user.

When the gesture detection circuit is used, gesture detection elements such as a gravity acceleration sensor and a gyroscope are arranged in the gesture detection circuit, when the head of a user moves or rotates, the gesture of the user can be detected, detected gesture data are transmitted to processing elements such as a controller, and the processing elements can adjust specific picture contents in the display assembly according to the detected gesture data.

As shown in fig. 1, in some embodiments, the shown head-mounted device 500 may access the display device 200 and construct a network-based display system with the server 400, and data interaction may be performed in real time among the head-mounted device 500, the display device 200, and the server 400, for example, the display device 200 may obtain media data from the server 400 and play the media data, and transmit specific screen content to the head-mounted device 500 for display.

The display device 200 may be a liquid crystal display, an OLED display, a projection display device, among others. The particular display device type, size, resolution, etc. are not limited, and those skilled in the art will appreciate that the display device 200 may be modified in performance and configuration as desired. The display device 200 may provide a broadcast receiving tv function, and may additionally provide an intelligent network tv function of a computer supporting function, including, but not limited to, a network tv, an intelligent tv, an Internet Protocol Tv (IPTV), etc.

The display device 200 and the head-mounted device 500 also communicate data with the server 400 through a variety of communication means. The display device 200 and the head-mounted device 500 may be allowed to be communicatively connected via a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display device 200. By way of example, display device 200 receives software program updates, or accesses a remotely stored digital media library by sending and receiving information, as well as Electronic Program Guide (EPG) interactions. The server 400 may be a cluster, or may be multiple clusters, and may include one or more types of servers. Other web service content such as video on demand and advertising services are provided through the server 400.

In the course of data interaction, the user may operate the display device 200 through the mobile terminal 300 and the remote controller 100. The mobile terminal 300 and the remote controller 100 may communicate with the display device 200 by a direct wireless connection or by a non-direct connection. That is, in some embodiments, the mobile terminal 300 and the remote controller 100 may communicate with the display device 200 through a direct connection manner of bluetooth, infrared, etc. When transmitting the control instruction, the mobile terminal 300 and the remote controller 100 may directly transmit the control instruction data to the display device 200 through bluetooth or infrared.

In other embodiments, the mobile terminal 300 and the remote controller 100 may also access the same wireless network with the display device 200 through a wireless router to establish indirect connection communication with the display device 200 through the wireless network. When transmitting the control command, the mobile terminal 300 and the remote controller 100 may transmit the control command data to the wireless router first, and then forward the control command data to the display device 200 through the wireless router.

In some embodiments, the user may also use the mobile terminal 300 and the remote controller 100 to directly interact with the headset 500, for example, the mobile terminal 300 and the remote controller 100 may be used as handles in a virtual reality scenario to implement functions such as somatosensory interaction.

In some embodiments, the display components of the headset 500 include a display screen and drive circuitry associated with the display screen. To present a specific picture and bring about a stereoscopic effect, two display screens may be included in the display assembly, corresponding to the left and right eyes of the user, respectively. When the 3D effect is presented, the picture contents displayed in the left screen and the right screen are slightly different, and a left camera and a right camera of the 3D film source in the shooting process can be respectively displayed. Because of the content of the screen observed by the left and right eyes of the user, a display screen with a strong stereoscopic impression can be observed when the display screen is worn.

The optical system in the headset 500 is an optical module composed of a plurality of lenses. The optical system is arranged between the eyes of the user and the display screen, and the optical path can be increased through the refraction of the optical signals by the lens and the polarization effect of the polaroid on the lens, so that the content presented by the display component can be clearly presented in the visual field of the user. Meanwhile, in order to adapt to the vision condition of different users, the optical system also supports focusing, namely, the position of one or more of the lenses is adjusted through the focusing assembly, the mutual distance among the lenses is changed, and therefore the optical path is changed, and the picture definition is adjusted.

The interface circuit of the head-mounted device 500 may be used to transfer interaction data, and besides transferring gesture data and displaying content data, in practical application, the head-mounted device 500 may also be connected to other display devices or peripheral devices through the interface circuit, so as to implement more complex functions through data interaction with the connection device. For example, the head-mounted device 500 may be connected to a display device through an interface circuit, so that a displayed screen is output to the display device in real time for display. For another example, the headset 500 may also be connected to a handle via interface circuitry, which may be held in the hand of a user to perform the relevant operations in the VR user interface.

Wherein the VR user interface can be presented as a plurality of different types of UI layouts depending on user operation. For example, the user interface may include a global interface, such as the global UI shown in fig. 2 after the AR/VR terminal is started, which may be displayed on a display screen of the AR/VR terminal or may be displayed on a display of the display device. The global UI may include a recommended content area 1, a business class extension area 2, an application shortcut entry area 3, and a hover area 4.

The recommended content area 1 is used for configuring TAB columns of different classifications; media resources, themes and the like can be selectively configured in the columns; the media assets may include 2D movies, educational courses, travel, 3D, 360 degree panoramas, live broadcasts, 4K movies, program applications, games, travel, etc. services with media asset content, and the fields may select different template styles, may support simultaneous recommended programming of media assets and themes, as shown in fig. 3.

In some embodiments, the content recommendation area 1 may also include a main interface and a sub-interface. As shown in fig. 3, the portion located in the center of the UI layout is a main interface, and the portions located at both sides of the main interface are sub-interfaces. The main interface and the auxiliary interface can be used for respectively displaying different recommended contents. For example, according to the recommended type of the sheet source, the service of the 3D sheet source may be displayed on the main interface; and the left side sub-interface displays the business of the 2D film source, and the right side sub-interface displays the business of the full-scene film source.

Obviously, for the main interface and the auxiliary interface, different service contents can be displayed and simultaneously presented as different content layouts. And, the user can control the switching of the main interface and the auxiliary interface through specific interaction actions. For example, by controlling the focus mark to move left and right, the focus mark moves right when the focus mark is at the rightmost side of the main interface, the auxiliary interface at the right side can be controlled to be displayed at the central position of the UI layout, at this time, the main interface is switched to the service for displaying the full-view film source, and the auxiliary interface at the left side is switched to the service for displaying the 3D film source; and the right side sub-interface is switched to the service of displaying the 2D patch source.

In addition, in order to facilitate the user to watch, the main interface and the auxiliary interface can be displayed respectively through different display effects. For example, the transparency of the secondary interface can be improved, so that the secondary interface obtains a blurring effect, and the primary interface is highlighted. The auxiliary interface can be set as gray effect, the main interface is kept as color effect, and the main interface is highlighted.

In some embodiments, the top of the recommended content area 1 may also be provided with a status bar, in which a plurality of display controls may be provided, including time, network connection status, power, and other common options. The content included in the status bar may be user-defined, e.g., weather, user avatar, etc., may be added. The content contained in the status bar may be selected by the user to perform the corresponding function. For example, when the user clicks on the time option, the headset 500 may display a time device window in the current interface, or jump to a calendar interface. When the user clicks on the network connection status option, the headset 500 may display a WiFi list on the current interface or jump to the network setup interface.

The content displayed in the status bar may be presented in different content forms according to the setting status of a specific item. For example, the time control may be displayed directly as specific time text information and display different text at different times; the power control may be displayed as a different pattern according to the current power remaining condition of the headset 500.

The status bar is used to enable the user to perform common control operations, enabling quick setup of the headset 500. Since the setup procedure for the headset 500 includes a number of items, all of the commonly used setup options cannot generally be displayed in the status bar. To this end, in some embodiments, an expansion option may also be provided in the status bar. After the expansion option is selected, an expansion window may be presented in the current interface, in which a plurality of setting options may be further provided for implementing other functions of the headset 500.

For example, in some embodiments, after the expansion option is selected, a "shortcut center" option may be set in the expansion window. After clicking on the shortcut center option, the user may display a shortcut center window by the headset 500. The shortcut center window can comprise screen capturing, screen recording and screen throwing options for respectively waking up corresponding functions.

The traffic class extension area 2 supports extension classes that configure different classes. And if the new service type exists, supporting configuration independent TAB, and displaying the corresponding page content. The service classification in the service classification expansion area 2 can also be subjected to sequencing adjustment and offline service operation. In some embodiments, the service class extension area 2 may include content: movie, education, travel, application, my. In some embodiments, the traffic class extension area 2 is configured to show large traffic classes TAB and support more classes configured, the icon of which supports the configuration as shown in fig. 3.

The application shortcut entry area 3 may specify that pre-installed applications, which may be specified as a plurality, are displayed in front for operational recommendation, supporting configuration of special icon styles to replace default icons. In some embodiments, the application shortcut entry area 3 further includes a left-hand movement control, a right-hand movement control for moving the options target, for selecting different icons, as shown in fig. 4.

The hover region 4 may be configured to be above the left diagonal side, or above the right diagonal side of the fixation region, may be configured as an alternate character, or may be configured as a jump link. For example, the suspension jumps to an application or displays a designated function page after receiving a confirmation operation, as shown in fig. 5. In some embodiments, the suspension may also be configured without jump links, purely for visual presentation.

In some embodiments, the global UI further includes a status bar at the top for displaying time, network connection status, power status, and more shortcut entries. After the handle of the AR/VR terminal is used, namely the handheld controller selects the icon, the icon displays a text prompt comprising left and right expansion, and the selected icon is stretched and expanded left and right according to the position.

For example, after selecting the search icon, the search icon will display the text "search" and the original icon, and after further clicking the icon or text, the search icon will jump to the search page; for another example, clicking on the favorites icon jumps to favorites TAB, clicking on the history icon defaults to locating the display history page, clicking on the search icon jumps to the global search page, clicking on the message icon jumps to the message page.

In some embodiments, the interaction may be performed through a peripheral device, e.g., a handle of the AR/VR terminal may operate a user interface of the AR/VR terminal, including a back button; the home key can realize the reset function by long-time pressing; volume up and down buttons; and the touch area can realize clicking, sliding and holding drag functions of the focus.

In the foregoing embodiments, the VR device is a device that applies virtual reality technology to present a virtual screen to a user. The AR device is a device that applies the augmented reality technology to present a virtual-real combined picture to the user. In addition, a Virtual Reality (VR) technology is a display technology that simulates a Virtual environment by a computer, thereby giving a sense of environmental immersion. The augmented reality (Augmented Reality, AR) technology is a display technology that simulates physical information such as visual, sound, taste, and touch, which are hardly experienced in a certain time-space range of the real world, through a scientific technology such as a computer, and applies the simulated information to the real world. Indeed, in embodiments of the present application, VR device and AR device may be collectively referred to as a headset 500.

SLAM (Simultaneous Localization and Mapping, synchronous localization and mapping) is a fundamental algorithm in VR technology and AR technology, providing self localization and environmental mapping for the headset 500, and is a bridge connecting the real world and the virtual world. The starting speed and accuracy of SLAM directly determine the user experience of the headset 500, and at present, two methods for initializing a map of SLAM are available: the method has the advantages that the starting speed is pursued, the engineering requirement is emphasized, the starting can be completed at extremely high speed under normal conditions, but the method has low initialization precision under abnormal conditions such as illumination change, low texture and the like, and the stability of a subsequent graph is easily influenced. In addition, the feature is strictly selected in the initialization process, and repeated initialization operation is performed, but the initialization time is long, so that the use experience of a user is affected.

As can be seen, in the map initialization process of the head-mounted device 500, if the initialization speed is to be ensured, the initialization accuracy is reduced; and if the initialization accuracy is to be ensured, the initialization speed is reduced. The initialization speed and the precision cannot be simultaneously considered, and it is difficult to ensure the stability of the subsequent drawings and the experience of the user using the headset 500.

In order to solve the above-described problems, a headset 500 is provided in an embodiment of the present application, which may include a camera, an inertial measurement unit, and a controller.

The images such as the map created by the head-mounted device 500 are all displayed in the rendering scene, which is one virtual scene created by the rendering engine of the head-mounted device 500 through the rendering program. For example, the unit-based 3D rendering engine's head-mounted device 500 may construct a unit 3D scene when rendering a display. In a unit 3D scene, various virtual objects and functionality controls may be added to render a particular usage scene. For example, when playing multimedia resources, a display panel may be added in the unit 3D scene, where the display panel is used to present the multimedia resource picture. Meanwhile, virtual object models such as seats, sound equipment, people and the like can be added in the units 3D scene, so that cinema effect is created.

To output the rendered picture, the headset 500 may also set a virtual camera in the units 3D scene. For example, the headset 500 may set a left-eye camera and a right-eye camera in a unit 3D scene according to a positional relationship of both eyes of a user, and the two virtual cameras may simultaneously photograph objects in the unit 3D scene, thereby outputting rendering pictures to a left display and a right display, respectively. In order to obtain a better immersive experience, the angles of the two virtual cameras in the units 3D scene can be adjusted in real time along with the gesture sensor of the headset 500, so that when a user wears the headset 500 to act, rendered pictures in the units 3D scene at different viewing angles can be output in real time.

In constructing a map based on SLAM, the headset 500 will also simultaneously capture real world images with the left and right eye cameras to render in a unit 3D scene.

The cameras of the headset 500 described above include both left-eye and right-eye cameras, each frame of image captured by the camera also including a left-side image and a right-side image, respectively.

The inertial measurement unit is also known as IMU (Inertial Measurement Unit). Generally, an IMU includes three single-axis accelerometers and three single-axis gyroscopes, where the accelerometers detect acceleration signals of the object in the carrier coordinate system on three independent axes, and the gyroscopes detect angular velocity signals of the carrier relative to the navigation coordinate system, measure angular velocity and acceleration of the object in three-dimensional space, and calculate the attitude of the object based on the angular velocity and acceleration.

In the embodiment of the application, the camera is combined with the IMU, and the IMU can determine the pose of each frame of image based on the image shot by the camera.

Fig. 6 illustrates a first flow diagram of a mapping initialization process in a headset device according to some embodiments. As shown in fig. 6, the controller of the headset 500 may be configured to perform the following steps:

step S101, tracking 2D characteristic points in a previous frame image in a current frame image, and obtaining front and back tracking 2D points.

The current frame image is two images of the real world where the user photographed by the left eye camera and the right eye camera respectively at the current moment. The previous frame of image is two images of the real world where the user photographed by the left and right eye cameras at the previous moment respectively. The left-eye camera shoots a left image, and the right-eye camera shoots a right image. The images shot at the current moment are respectively a current left image and a current right image, and the images shot at the previous moment are respectively a historical left image and a historical right image.

The IMU may calculate an inertial initial pose of the previous frame image based on the previous frame image and the acceleration and angular velocity of the headset 500 measured when the previous frame image was captured, and the like. And calculating an inertial initial pose of the current frame image based on the inertial optimization pose of the previous frame image, the acceleration and angular velocity of the head-mounted device 500 measured at the time of the current frame image and the current photographing, and the like.

In the embodiment of the present application, the processing procedure of the previous frame image is the same as the processing procedure of the current frame image, that is, both the front and back frame tracking matching and the binocular stereo matching are required.

In the process of front and back frame tracking matching, if a left eye camera is used as a main camera, front and back frame tracking matching is required to be performed by utilizing the 2D characteristic points in the current left image and the 2D characteristic points in the historical left image, so that front and back tracking 2D points are obtained. Or if the right-eye camera is used as the main camera, the front-back frame tracking matching is needed by utilizing the 2D characteristic points in the current right image and the 2D characteristic points in the historical right image, so that the front-back tracking 2D points are obtained.

In the process of tracking and matching the front frame and the rear frame, the method is carried out based on the characteristic points on the frame images acquired at the front moment and the rear moment, and the characteristic points on the images are 2D points with only two-dimensional coordinates, so that the front tracking point and the rear tracking point matched by the front frame and the rear frame are all 2D points.

When the front and back frame tracking matching is carried out on the current frame image acquired at the time t+1, taking the left eye camera as a main camera as an example, tracking 2D characteristic points in a historical left image at the time t in the current left image at the time t+1. If some 2D feature points in the historical left image at time t cannot be tracked in the current left image at time t+1, then the 2D points are eliminated. Finally, only 2D feature points in the left image at the time t, which can be tracked in the left image at the time t+1, are reserved as the front-back tracking 2D points.

Fig. 7 illustrates a schematic diagram of a front-to-back frame tracking match, in accordance with some embodiments. As shown in fig. 7, with the left-eye camera as the main camera, 2D feature points existing in the historical left-side image at time t are P6, P7, and P8, and P6 and P8 can be continuously tracked in the current left-side image at time t+1, but P7 cannot be tracked. At this time, P7 needs to be removed, and finally, P6 and P8 are the front-back tracking 2D points obtained in step S101.

Step S102, PNP calculation is carried out by utilizing the 2D points which are successfully triangulated in the previous frame image in the front and back tracking 2D points, the inertial initial pose of the current frame image is optimized, and the inertial optimized pose of the current frame image is obtained.

The triangularization in the embodiments of the present application is a very basic operation in SLAM, since the points at each frame image point are in 2D state, with only the abscissa and the ordinate and no depth. In the process of map establishment, the pose calculation, optimization and other processes of the image are all required to be carried out according to the three-dimensional space coordinates of the 3D points. Therefore, in the embodiment of the present application, the 3D point in each frame of image is mainly obtained, and further, after each frame of image is subjected to front-back tracking matching and binocular stereo matching, the matched 2D feature points need to be triangulated, so that the matched 2D feature points are converted into 3D points with three-dimensional space coordinates. The 2D feature points that can be triangulated need only be triangulated once.

If 2D points in the previous frame image that have been successfully triangulated can be tracked in the current frame, then the 2D points that can be tracked and that have been successfully triangulated can be used to pose an optimization of the current frame image when processing the current frame image.

In the embodiment of the application, the pose optimization mode adopts a PNP algorithm. Typically, each frame of image corresponds to a camera pose in addition to an inertial pose. The camera pose is calculated by a visual odometer.

The inertial initial pose of the current frame image is obtained by an inertial measurement unit. However, the error of the initial pose of inertia is larger, and the initial pose of inertia can be correspondingly calculated and optimized by converting the external parameters of the camera and the inertial measurement unit into the pose of the camera. The method comprises the steps of firstly obtaining an inertial initial pose of a current frame image through an inertial measurement unit, and converting the inertial initial pose into a camera initial pose through external parameters. And then PNP (peer-n-point) calculation is carried out by utilizing the 2D points which are tracked in the process of tracking and matching the previous and the next frames and are successfully triangulated in the previous frame image, so as to obtain the camera optimization pose corresponding to the current frame image. Finally, the camera optimizing pose is converted into the inertia optimizing pose by utilizing the external parameters.

When PNP is calculated, 3D points need to be projected onto 2D feature points on the current frame image, and since 2D points have been successfully triangulated in the previous frame image, these 2D points necessarily have corresponding converted 3D points and spatial three-dimensional coordinates.

And step S103, extracting 2D characteristic points from the current frame image according to a preset extraction requirement, and performing binocular stereo matching and triangularization processing to obtain binocular matching 3D points after triangularization is successful.

The preset extraction requirement is used to specify the number of 2D feature points in each frame image, and if the number of 2D feature points of the previous frame image tracked in the current frame image cannot meet the preset number of feature points, more 2D feature points need to be extracted in the current frame image, so that the number of 2D points and the extracted 2D feature points are tracked back and forth and the preset number of feature points is met,

for example, the preset extraction requirement specifies that the number of 2D feature points in each frame image is 100, and the number of 2D points of the current frame image at the tracking points before and after the tracking of the previous frame image is 50, then in step S103, 50 2D feature points need to be extracted from the current frame image, so as to satisfy the number requirement of 100 feature points. In the binocular stereo matching process, the new extracted 50 2D feature points are used as the basis.

In the binocular stereo matching process, a plurality of 2D feature points which can be tracked simultaneously in the left image and the right image are determined in a calibrated mode.

Fig. 8 illustrates a schematic diagram of binocular stereo matching according to some embodiments. As shown in fig. 8, when binocular stereo matching is performed on the current frame image acquired at time t+1, it can be determined that binocular matching 2D points that can be simultaneously tracked in the left and right images are P1, P2, P3, P4, P5, and the like, respectively.

P1, P2, P3, P4 and P5 are all 2D points on the image, and then triangularization is carried out on the 2D points, wherein the points which can be triangulated successfully are binocular matching 3D points.

And step S104, performing triangulation processing on the front and back tracking 2D points to obtain front and back tracking 3D points after the triangulation is successful.

The 2D points which are successfully triangulated in the previous frame image are also included in the front and back tracking 2D points, the 2D points can be used as a part of 2D characteristic points of the current frame image whether successful or unsuccessful, the 3D points are also used as a part of the point cloud of the current frame image after triangulating and converting the 2D points into 3D points. And the other part of the image point cloud of the current frame is the binocular matching 3D point of the current frame image obtained in the step S103.

In step S105, when the number of binocular matching 3D points is greater than a preset threshold, it is determined whether the number of all frame images is greater than or equal to a first preset number.

In the embodiment of the application, in order to improve the precision of the map initialization process, a certain requirement is provided for the number of binocular matching 3D points in each frame of image. Specifically, a preset threshold value can be set to judge the number of binocular matching 3D points, when the number of binocular matching 3D points on a certain frame of image is greater than the preset threshold value, the frame of image meets the requirement, and whether all the frame of images processed currently are full of windows or not can be continuously judged, namely, whether the number of all the frame of images is greater than or equal to the first preset number or not.

It should be noted that the number of binocular matching 3D points on the first frame image is required to be higher than that of other frame images, for example, the number of binocular matching 3D points on other frame images may be required to be greater than 4, 5, etc., but the number of binocular matching 3D points on the first frame image is required to be greater than 40, 50, etc.

And S106, when the number of all the frame images is greater than or equal to the first preset number, performing BA optimization by combining the inertial optimization pose, the binocular matching 3D points and the front and back tracking 3D points of all the frame images to obtain an initial map, thereby completing the map building initialization process.

Each frame image being processed by the headset 500 may be referred to as a current frame image, except at a different time of processing of each frame image. In this embodiment of the present application, two-frame image iterative processing is required, and in order to distinguish frame images processed at different times, an image is defined as a previous frame image, a current frame image, a next frame image, and so on.

The first frame image cannot be subjected to front-back frame tracking matching, so that the inertial optimization pose cannot be calculated through PNP. However, the inertial pose of the first frame image determined by the way of calibrating the left and right images in the first frame image is also relatively accurate and optimized. Further, for the first frame image, the current inertial pose of the first frame image may be directly used when the step S106 is performed.

The above-described iterative process may terminate when the number of frame images processed by the headset 500 meets the map initialization requirement. And finally, constructing an initial map by utilizing all frame images in the iteration process. And when the number of the frame images does not meet the requirement of the image construction initialization, continuing to acquire the next frame image, and repeating the processes of the steps S101-S105 again until the number of the frame images meets the requirement of the image construction initialization.

In this embodiment of the present application, the first preset number may be set to determine whether the number of frame images meets the requirement of initializing the map. If the number of all the frame images is larger than or equal to the first preset number, determining that the number of the frame images meets the map building initialization requirement; and if the number of all the frame images is smaller than the first preset number, determining that the number of the frame images does not meet the initialization requirement of the image construction.

As can be seen, in the process of initializing the image, the head-mounted device 500 in the embodiment of the present application may complete pose estimation of the current frame image through tracking matching between the current frame image and the previous frame image, and generate a point cloud of the current frame image based on binocular stereo matching and front-back frame tracking matching. And when the number of all the frame images meets the requirement of the initialization number, an initial map is built by utilizing the pose and the point cloud of all the frame images, so that the map building initialization process is completed. This approach simultaneously combines the speed and accuracy of map initialization, guaranteeing the stability of subsequent maps and the experience of the user using the headset 500.

In the embodiment of the application, the purpose of acquiring the real world image by the camera is to acquire more real world image information in the process of moving by the user. For example, when the camera acquires the previous frame of image, the head-mounted device 500 worn by the user is located in the first direction and angle, and when the user rotates the head to the right, the head-mounted device 500 is located in the second direction and angle, and the current frame of image acquired again by the camera contains more image information on the right side of the user in the real world than the previous frame of image, so that the current frame of image is also beneficial to the head-mounted device 500 to optimize the previous frame of image, thereby improving the constructed map.

As in the previous embodiments, each frame image acquired by the camera includes a left side image and a right side image, and therefore, when performing binocular stereo matching on the current frame image, as shown in fig. 9, the controller of the head mounted device 500 may be further configured to perform the steps of:

step S201, respectively extracting a preset number of 2D feature points from the current left image and the current right image of the current frame image according to a preset extraction requirement.

Step S202, binocular stereo matching is carried out on the 2D feature points in the current left image and the current right image, and binocular matching 2D points are obtained.

And step S203, performing triangulation processing on the binocular matching 2D points to obtain binocular matching 3D points after the triangulation is successful.

In the foregoing embodiment, when the number of binocular matching 3D points on a certain frame image is smaller than or equal to the preset threshold, it is indicated that the frame image does not meet the requirement, and at this time, the frame image needs to be cleared, and the next frame image is continuously acquired for processing.

In this process, as shown in fig. 10, the controller of the headset 500 may also be configured to perform the following steps:

in step S301, when the number of binocular matching 3D points is less than or equal to the preset threshold, the current frame image and all 2D points and 3D points obtained according to the current frame image are cleared.

Wherein, all 2D points and 3D points obtained according to the current frame image comprise front and back tracking 2D points obtained in the current frame image, front and back tracking 3D points, binocular stereo matching 2D points obtained in the current frame image, binocular stereo matching 3D points and the like.

Step S302, changing the preset extraction requirement, and extracting more 2D feature points in the next frame image than in the current frame image.

In the embodiment of the present application, changing the preset extraction requirement may be reducing the feature extraction threshold, increasing the number of extracted features, and so on.

After changing the preset extraction requirements, the controller of the headset 500 may continue to acquire the next frame image, and in step S103, the 2D feature points in the next frame image are extracted according to the changed preset extraction requirements, and binocular stereo matching and triangularization are performed on the 2D feature points, so as to obtain binocular matching 3D points of the next frame image. If the number of the binocular matching 3D points at this time is still less than or equal to the preset threshold, the process of steps S101 to S104 is repeated by continuing to acquire the next frame of image. And if the number of the binocular matching 3D points is larger than the preset threshold value, accumulating the next frame image into the number of the frame images which are processed currently, so as to start judging whether the number of all the frame images is larger than or equal to the first preset number.

If the number of all the frame images is greater than or equal to the first preset number, determining that the number of the frame images meets the map building initialization requirement. When the number of frame images satisfies the map creation initialization requirement and the initial map construction is completed with all the frame images, as shown in fig. 11, the controller of the head-mounted device 500 may be further configured to perform the steps of:

in step S401, when the number of all the frame images is greater than or equal to the first preset number, inertial bias optimization is performed on each frame image by using the inertial optimization pose of each frame image, and the inertial state variable of each frame image is updated.

Inertial bias optimization in the embodiments of the present application refers to IMU bias optimization. The IMU can also acquire inertial state variables of the frame image

Wherein (1)>

And->

For the translational and rotational part of the pose, +.>

For velocity, ba and Bg are the offsets of acceleration and angular velocity, respectively. When IMU bias optimization is carried out, the influence of Ba is considered to be smaller, so that the optimization flow can be simplified, bg is only updated, bg constraint can be constructed by utilizing IMU pre-integration, and updating can be solved.

After the updating, the inertial state variable after the frame image updating can be obtained.

Step S402, BA optimization is carried out by utilizing the updated inertial state variable of each frame of image, the binocular matching 3D point in each frame of image and the front and back tracking 3D point, and an initial map is obtained, so that the map building initialization process is completed.

BA optimization is Bundle Adjustment optimization, also known as beam steering, bundle set optimization. In the BA optimization, the updated state variables of each frame of image are used

And constructing an IMU pre-integral residual error and a visual re-projection residual error, and obtaining a more accurate initial map by joint optimization updating.

In addition, in the above embodiment, when BA optimization is performed on all frame images, it is necessary to perform, based on point clouds of all frame images, which are a set of points having three-dimensional space coordinates, that is, a set of binocular-matched 3D points and front-back tracked 3D points in each frame image.

If the number of all the frame images is smaller than the first preset number, determining that the number of the frame images does not meet the initialization requirement of the image construction. At this time, the process of steps S101 to S105 is repeated until the number of frame images is greater than or equal to the first preset number. In this process, as shown in fig. 12, the controller of the headset 500 may be configured to perform the following steps:

in step S501, when the number of all the frame images is smaller than the first preset number, the currently processed frame image and all the 2D points and 3D points obtained from the frame image are retained.

Wherein all 2D points and 3D points obtained from the frame image include front-back tracking 2D points, front-back tracking 3D points obtained from the frame image, and binocular stereo matching 2D points, binocular stereo matching 3D points, and the like obtained from the frame image.

Step S502, the preset extraction requirements are restored.

In the above step S302, the preset extraction requirement may be changed so that more 2D feature points may be extracted in the next frame image. It will be appreciated that this way of altering the preset extraction requirements may only be applicable to the next frame of image. For other frame images, the original preset extraction requirements can be applicable. Therefore, after steps S103 to S105 are performed according to the modified preset extraction requirement, if the total number of frame images is not satisfied with the initial requirement of the image creation, the modified preset extraction requirement needs to be restored to the previous preset extraction requirement, and then the next frame image is processed.

From the above, it can be seen that the headset 500 in the embodiment of the present application can complete pose estimation of the current frame image through tracking matching between the current frame image and the previous frame image, and generate a point cloud of the current frame image based on binocular stereo matching and front-back frame tracking matching. And when the quantity of all the frame images meets the requirement of the initialization quantity, an initial map is established by utilizing the pose and the point cloud of all the frame images together in an optimization mode, so that the map establishment initialization process is completed. The method simultaneously gives consideration to the initialization speed and precision of the drawing, and ensures the stability of the subsequent drawing and the experience of the user in using the head-mounted equipment.

In addition, in order to solve the problem that the headset 500 in the foregoing embodiment cannot simultaneously consider the initialization speed and the precision, the embodiment of the present application further provides a graph initialization method in the headset, where the method is applied to the headset 500 in the foregoing embodiment and implemented by the controller of the headset 500. Specifically, the method may include the steps of:

And step S102, PNP calculation is carried out by utilizing the 2D points which are successfully triangulated in the previous frame image in the front and back tracking 2D points, and the inertial initial pose of the current frame image is optimized, so that the inertial optimized pose of the current frame image is obtained.

Step S105, when the number of the binocular matching 3D points is greater than a preset threshold, determining whether the number of all frame images is greater than or equal to a first preset number.

Since the mapping initialization method in the headset according to the embodiment of the present application may be applied to the headset 500, other contents of the mapping initialization method in the headset according to the embodiment of the present application may be referred to the content of the virtual reality and the device 500, and will not be described herein. In addition, the method for initializing the drawing in the headset device in the embodiment of the application can also consider the speed and the precision of initializing the drawing at the same time, and ensure the stability of the subsequent drawing and the experience of the user in using the headset device 500.

The foregoing detailed description of the embodiments is merely illustrative of the general principles of the present application and should not be taken in any way as limiting the scope of the invention. Any other embodiments developed in accordance with the present application without inventive effort are within the scope of the present application for those skilled in the art.

Claims

1. A headset, comprising:

a camera configured to capture an image of a real world;

an inertial measurement unit configured to acquire an inertial initial pose of each frame of image of the real world;

a controller configured to:

tracking 2D characteristic points in a previous frame image in a current frame image to obtain front and back tracking 2D points;

PNP calculation is carried out by utilizing the 2D points which are successfully triangulated in the previous frame image in the front and back tracking 2D points, and the inertial initial pose of the current frame image is optimized to obtain the inertial optimized pose of the current frame image;

extracting 2D characteristic points from the current frame image according to a preset extraction requirement, and performing binocular stereo matching and triangularization treatment to obtain binocular matching 3D points after triangularization is successful;

performing triangulation processing on the front and back tracking 2D points to obtain front and back tracking 3D points after the triangulation is successful;

when the number of the binocular matching 3D points is larger than a preset threshold value, determining whether the number of all frame images is larger than or equal to a first preset number;

and when the number of all the frame images is greater than or equal to the first preset number, performing BA optimization by combining the inertial optimization pose, the binocular matching 3D points and the front and back tracking 3D points of all the frame images to obtain an initial map, thereby completing the map building initialization process.

2. The head-mounted device according to claim 1, wherein the current frame image includes a current left image and a current right image of a current time taken by the left and right eye cameras, respectively; the controller is further configured to:

respectively extracting 2D characteristic points of a preset number from the current left image and the current right image of the current frame image according to preset extraction requirements;

binocular stereo matching is carried out on the 2D characteristic points in the current left image and the current right image, and binocular matching 2D points are obtained;

and carrying out triangulation processing on the binocular matching 2D points to obtain binocular matching 3D points after successful triangulation.

3. The head-mounted device according to claim 1, wherein the current frame image includes a current left image and a current right image of a current time taken by the left and right eye cameras, respectively, and the previous frame image includes a history left image and a history right image of a previous time taken by the left and right eye cameras, respectively; the controller is further configured to:

and performing front-back frame tracking matching by using the 2D characteristic points in the current left image and the 2D characteristic points in the historical left image, or performing front-back frame tracking matching by using the 2D characteristic points in the current right image and the 2D characteristic points in the historical right image, so as to obtain front-back tracking 2D points.

4. The headset of claim 1, wherein the controller is further configured to:

when the number of the binocular matching 3D points is smaller than or equal to a preset threshold value, clearing the current frame image and all 2D points and 3D points obtained according to the current frame image;

changing the preset extraction requirement, and extracting more 2D feature points in the next frame image than the current frame image.

5. The headset of claim 4, wherein the controller is further configured to:

when the number of all the frame images is smaller than a first preset number, reserving the currently processed frame images and all the 2D points and the 3D points obtained according to the frame images;

and recovering the preset extraction requirements.

6. The headset of any one of claims 1-5, wherein the controller is further configured to:

when the number of all the frame images is greater than or equal to the first preset number, performing inertial bias optimization on each frame image by using the inertial optimization pose of each frame image, and updating the inertial state variable of each frame image;

and (3) performing BA optimization by using the updated inertial state variable of each frame of image, the binocular matching 3D point in each frame of image and the front and back tracking 3D point to obtain an initial map, thereby completing the map building initialization process.

7. A method for initializing a mapping in a headset, the method comprising:

8. The method of claim 7, wherein the current frame image includes a current left image and a current right image of a current time taken by the left and right eye cameras, respectively; the method further comprises the steps of:

9. The method of claim 7, wherein the current frame image includes a current left image and a current right image of a current time taken by the left and right eye cameras, respectively, and the previous frame image includes a historical left image and a historical right image of a previous time taken by the left and right eye cameras, respectively; the method further comprises the steps of:

10. The method of claim 7, wherein the method further comprises: