CN117671213A

CN117671213A - ORB-SLAM 2-based large-space AR object identification method and system

Info

Publication number: CN117671213A
Application number: CN202311721205.4A
Authority: CN
Inventors: 钱敏; 蒋坚; 于中阳; 王亚菁
Original assignee: Shanghai Shengyang Wuyue Digital Technology Co ltd; Shanghai Jimu Galaxy Digital Technology Co ltd
Current assignee: Shanghai Shengyang Wuyue Digital Technology Co ltd; Shanghai Jimu Galaxy Digital Technology Co ltd
Priority date: 2023-12-14
Filing date: 2023-12-14
Publication date: 2024-03-08

Abstract

The application provides a large-space AR object recognition method and system based on ORB-SLAM 2; the large-space AR object identification method comprises the following steps: extracting ORB features from a camera stream shot by the intelligent equipment, and constructing a global space point cloud map corresponding to the camera stream according to an ORB-SLAM2 technology by using the ORB features; generating an AR object using a Unity editor; according to the space coordinate system alignment technology, aligning a virtual space coordinate system of the AR object and a point cloud map coordinate system of the global space point cloud map; according to a point cloud map coordinate system of the global space point cloud map, a Web AR engine is used for controlling the display position of an AR object in a camera stream corresponding to a real space; and displaying the AR object in the WeChat applet according to the display position of the AR object in the real space. The technical scheme of the application can solve the problem that the android machine supported by the xr-frame is limited in the prior art, and a great amount of manpower, material resources and period are consumed when the AR project is directly developed on the WeChat applet.

Description

ORB-SLAM 2-based large-space AR object identification method and system

Technical Field

The application relates to the technical field of augmented reality, in particular to a large-space AR object recognition method and system based on ORB-SLAM 2.

Background

AR (Augmented Reality ) technology is a technology that smartly merges virtual information with the real world. Mature AR implementation is based on engines such as abroad ARCore and ARKit or based on strategies such as domestic easy AR, and APP with the capability of enhancing reality is developed by combining large-space recognition algorithms of various large manufacturers. This kind of APP can bring AR experience, however to AR experience, its threshold is higher, at first needs install various AR services when installing APP, and secondly the model of APP adaptation is more limited, exists compatible bottleneck.

To solve the above problems, the prior art provides an AR project based on a large platform, for example, an AR project developed by using a WeChat applet as a platform is a good break.

At present, AR projects developed based on WeChat applets must be realized by relying on xr-frame plugins, however, the android model supported by xr-frames is extremely limited, which is less than one tenth of that of a commercial android machine. And if the AR project is developed on the WeChat applet, it consumes a lot of manpower, material resources and cycles. In addition, the WeChat applet xr-frame in combination with the large spatial recognition algorithm opens reference cases with only a few non-open source cases. Thus, based on the implementation of the large space recognition on the WeChat applet by the WeChat developer tool, more work is required, and it is difficult to achieve the desired effect.

In summary, a technology capable of realizing integration of Unity, weChat applet and AR technology is needed in the prior art, which is a major breakthrough for the AR technology to land.

Content of the application

The application provides a large-space AR object recognition scheme based on ORB-SLAM2, which can solve the problem that an android machine supported by xr-frames in the prior art is extremely limited, and a large amount of manpower, material resources and period are consumed if AR projects are developed on WeChat applets.

To solve the above problems, according to a first aspect of the present application, there is provided a large-space AR object recognition method based on ORB-SLAM2, including:

extracting ORB features from a camera stream shot by the intelligent equipment, and constructing a global space point cloud map corresponding to the camera stream according to an ORB-SLAM2 technology by using the ORB features;

generating an AR object using a Unity editor;

according to the space coordinate system alignment technology, aligning a virtual space coordinate system of the AR object and a point cloud map coordinate system of the global space point cloud map;

according to a point cloud map coordinate system of the global space point cloud map, a Web AR engine is used for controlling the display position of an AR object in a camera stream corresponding to a real space;

and displaying the AR object in the WeChat applet according to the display position of the AR object in the real space.

Preferably, in the above AR object recognition method, the step of extracting an ORB feature from a camera stream captured by an intelligent device, and constructing a global space point cloud map corresponding to the camera stream according to an ORB-SLAM2 technique using the ORB feature includes:

extracting a plurality of image frames from a camera stream;

respectively extracting ORB characteristic points from a plurality of image frames according to an ORB characteristic extraction algorithm;

determining a key frame from a plurality of image frames according to the ORB characteristic points;

splicing key frame point cloud data by using key frames and camera coordinates to obtain an initial spliced global point cloud map;

performing initial filtering on the initial spliced global point cloud map to obtain an initial filtering global point cloud map;

and updating the initial filtering global point cloud map by using a radius filtering algorithm to obtain a final global space point cloud map.

Preferably, in the above AR object recognition method, the step of determining a key frame from a plurality of image frames according to the ORB feature points includes:

judging whether the visual angle change amplitude between the current frame and the nearest key frame in the plurality of image frames is larger than or equal to a preset amplitude threshold value;

if the visual angle variation amplitude is greater than or equal to a preset amplitude threshold value, determining that the current frame is a new key frame;

Or,

judging whether the number of the matching points of the ORB characteristic points of the current frame and the ORB characteristic points of the nearest key frame in the plurality of image frames is smaller than or equal to a preset point threshold value;

if the number of the matching points of the ORB feature points is smaller than or equal to a preset point threshold value, determining that the current frame is a new key frame;

and performing loop detection on the global space point cloud map, and selecting an image frame in a preset distance range near the map position where the loop exists as a new key frame when the loop exists in the global space point cloud map.

Preferably, in the above AR object identification method, the step of updating the initial filtered global point cloud map by using a radius filtering algorithm to obtain a final global space point cloud map includes:

determining a neighbor point search range of each point in the initial filtering global point cloud map by using a fixed length radius;

searching neighbor points around each point in the neighbor point searching range;

and for any neighbor point, acquiring the attribute value of the neighbor point, and replacing the attribute value of the neighbor point by using the calculated statistical attribute value.

Preferably, in the above AR object recognition method, the step of generating the AR object using a Unity editor includes:

creating an instance of the AR object using the AR shape or AR model in the view of the Unity editor;

Adjusting the appearance and position of the AR object instance in the view;

processing the AR object instance by using a script of the Unity editor to generate a dynamic AR object;

adding a collision body component to the AR object, and adjusting physical properties of the AR object;

add Unity components to the AR object.

Preferably, in the above AR object identification method, the step of aligning the virtual space coordinate system of the AR object with the point cloud map coordinate system of the global space point cloud map according to the space coordinate system alignment technique includes:

calculating an Euclidean transformation matrix between a virtual space coordinate system of the AR object and a point cloud map coordinate system of the global space point cloud map;

using an Euclidean transformation matrix to perform space coordinate system transformation on the global space point cloud map and the AR object;

and identifying the AR object in the global space point cloud map according to the mapping information of the AR object in the global space point cloud map.

Preferably, in the above AR object recognition method, the step of controlling, using the Web AR engine, a display position of the AR object in the real space corresponding to the camera stream according to the point cloud map coordinate system of the global space point cloud map includes:

the Web AR engine acquires a plane or a position feature point of the acquired real space by using the global space point cloud map;

The WebAR engine calculates and sets the display position and conversion information of the AR object by using a 3D transformation matrix according to the plane or position feature points of the real space;

when the shot picture of the intelligent equipment is detected to move, the Web AR engine continuously tracks the plane or position feature points of the real space by using the global space point cloud map;

the Web AR engine adjusts the position and conversion information of the AR object in the real space in real time according to the plane or the position feature points of the real space.

Preferably, in the above AR object recognition method, the step of displaying the AR object in the WeChat applet according to a display position of the AR object in real space includes:

using a Web AR engine to connect the display position of the AR object in the real space with the conversion information and send the display position and the conversion information to a cloud server for storage;

according to the user instruction and the page setting, the AR object is displayed from the cloud server to the real space displayed by the WeChat applet.

According to a second aspect of the present application, there is also provided a large-space AR object recognition system based on ORB-SLAM2, comprising:

the ORB feature extraction module is used for extracting ORB features from a camera stream shot by the intelligent equipment;

the point cloud map construction module is used for constructing a global space point cloud map corresponding to the camera flow according to the ORB-SLAM2 technology by using ORB characteristics;

The AR object generation module is used for generating an AR object by using the Unity editor;

the object map fusion module is used for aligning a virtual space coordinate system of the AR object and a point cloud map coordinate system of the global space point cloud map according to a space coordinate system alignment technology;

the display position setting module is used for controlling the display position of the AR object on the global space point cloud map by using the WebAR engine according to the point cloud map coordinate system of the global space point cloud map;

and the AR object display module is used for displaying the AR object in the WeChat applet according to the display position of the AR object in the real space.

Preferably, in the large-space AR object recognition system, the point cloud map construction module is specifically configured to extract a plurality of image frames from a camera stream;

Preferably, in the large-space AR object recognition system, the point cloud map construction module is further specifically configured to determine whether a change amplitude of a viewing angle between a current frame and a nearest key frame in the plurality of image frames is greater than or equal to a predetermined amplitude threshold; if the visual angle variation amplitude is greater than or equal to a preset amplitude threshold value, determining that the current frame is a new key frame; or judging whether the number of the matching points of the ORB characteristic points of the current frame and the ORB characteristic points of the nearest key frame in the plurality of image frames is smaller than or equal to a preset point threshold value; if the number of the matching points of the ORB feature points is smaller than or equal to a preset point threshold value, determining that the current frame is a new key frame; and performing loop detection on the global space point cloud map, and selecting an image frame in a preset distance range near the map position where the loop exists as a new key frame when the loop exists in the global space point cloud map.

Preferably, in the large-space AR object recognition system, the AR object generating module is specifically configured to create, in a view of the Unity editor, an AR object instance using an AR shape or an AR model; adjusting the appearance and position of the AR object instance in the view; processing the AR object instance by using a script of the Unity editor to generate a dynamic AR object; adding a collision body component to the AR object, and adjusting physical properties of the AR object; add Unity components to the AR object.

Preferably, in the large-space AR object recognition system, the object map fusion module is specifically configured to calculate an euclidean transformation matrix between a virtual space coordinate system of the AR object and a point cloud map coordinate system of the global space point cloud map; using an Euclidean transformation matrix to perform space coordinate system transformation on the global space point cloud map and the AR object; and identifying the AR object in the global space point cloud map according to the mapping information of the AR object in the global space point cloud map.

Preferably, in the large-space AR object recognition system, the display position setting module is specifically configured to acquire a plane or a position feature point of the real space by using a Web AR engine and using a global space point cloud map; the WebAR engine calculates and sets the display position and conversion information of the AR object by using a 3D transformation matrix according to the plane or position feature points of the real space; when the shot picture of the intelligent equipment is detected to move, the Web AR engine continuously tracks the plane or position feature points of the real space by using the global space point cloud map; the Web AR engine adjusts the position and conversion information of the AR object in the real space in real time according to the plane or the position feature points of the real space.

According to a third aspect of the present application, there is also provided a large-space AR object recognition system based on ORB-SLAM2, comprising:

The method comprises the steps of a memory, a processor and a large-space AR object recognition program based on ORB-SLAM2, wherein the large-space AR object recognition program based on ORB-SLAM2 is stored in the memory and runs on the processor, and the large-space AR object recognition method based on ORB-SLAM2 provided by any one of the technical schemes is realized when the large-space AR object recognition program based on ORB-SLAM2 is executed by the processor.

In summary, according to the large-space AR object recognition solution based on ORB-SLAM2 provided in the above technical solution, through extracting ORB features from a camera stream photographed by an intelligent device, then using the ORB features to construct a global point cloud map according to ORB-SLAM2 technology, the coordinate position of a real space photographed by the intelligent device can be recognized through the global point cloud map, then using a Unity editor to generate an AR object, and according to a space coordinate system alignment technology, aligning a virtual space coordinate system of the AR object and a point cloud map coordinate system of the global space point cloud map, then using the point cloud map coordinate system of the global space point cloud map to fuse the AR object into the real space corresponding to the camera stream, so that the Web engine is used to control the display position of the AR object in the real space corresponding to the camera stream, and the AR object can be displayed in the corresponding display position on a micro-communication applet. In summary, according to the technical scheme, the high-quality AR objects generated by the Unity editor are connected in the cloud by using the Web AR engine, so that the function of displaying the Unity generated AR objects in the WeChat applet is realized, and the diversity of the generated AR objects is improved. The web engineering is exported from Unity by using the WebAR engine development based on Unity and adding algorithm service, and the engineering is arranged on a server to generate a webpage to load the applet, so that the strategy can save a great deal of labor cost and time cost, and the capability of Unity application in the WeChat applet is realized; the problem that a great deal of manpower, material resources and period are consumed in developing an AR project on a WeChat applet in the prior art is solved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained from the structures shown in these drawings without inventive effort to a person of ordinary skill in the art.

FIG. 1 is a block diagram of an ORB-SLAM 2-based large-space AR object recognition system provided in an embodiment of the present application;

FIG. 2 is a schematic flow chart of a large-space AR object recognition method based on ORB-SLAM2 according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of a method for constructing a global space point cloud map according to the embodiment shown in FIG. 2;

FIG. 4 is a flow chart of a method for determining keyframes based on ORB feature points provided by the embodiment of FIG. 3;

FIG. 5 is a flowchart illustrating an update method of an initial filtered global point cloud map according to the embodiment shown in FIG. 3;

FIG. 6 is a flow diagram of a method for generating an AR object using the Unity editor provided by the embodiment of FIG. 2;

FIG. 7 is a flowchart of a method for aligning an AR object with a coordinate system of a global space point cloud map according to the embodiment of FIG. 2;

FIG. 8 is a flow chart of a method for controlling the display position of an AR object according to the embodiment shown in FIG. 2;

FIG. 9 is a flow chart of a method for displaying an AR object in a WeChat applet provided in the embodiment of FIG. 2;

fig. 10 is a flow chart of a method for constructing a global space point cloud map according to an embodiment of the present application;

FIG. 11 is a schematic diagram of a first ORB-SLAM 2-based large-space AR object recognition system according to an embodiment of the present application;

FIG. 12 is a schematic diagram of a second ORB-SLAM 2-based large-space AR object recognition system according to an embodiment of the present application.

The realization, functional characteristics and advantages of the present application will be further described with reference to the embodiments, referring to the attached drawings.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The technical scheme in the prior art has the following technical problems:

the existing AR project developed based on WeChat applet must be realized by depending on xr-frame plug-in, however, the android machine supported by xr-frame is extremely limited and is less than one tenth of the android machine on the market; and if the AR project is developed on the WeChat applet, it consumes a lot of manpower, material resources and cycles. In addition, the WeChat applet xr-frame in combination with the large spatial recognition algorithm opens reference cases with only a few non-open source cases. Thus, based on the implementation of the large space recognition on the WeChat applet by the WeChat developer tool, more work is required, and it is difficult to achieve the desired effect.

In order to solve the problems in the prior art, the following embodiments of the present application provide an ORB-SLAM 2-based large-space AR object recognition scheme, which aims to use a unit-based Web AR engine development, add an algorithm service, export a Web project from a unit editor, arrange the Web project on a server, generate a Web page and load the WeChat applet, so that the policy can save a great deal of labor cost and time cost, and realize the capability of Unity application in the WeChat applet.

In order to achieve the above objective, referring to fig. 1, fig. 1 is a schematic diagram of an ORB-SLAM 2-based large-space AR object recognition system according to an embodiment of the present application. As shown in fig. 1, the large-space AR object recognition system includes:

a data collector 100 and a cloud server 200.

The data collector 100 includes a sensor 101 and a camera 102. The sensor 101 is used for acquiring data such as pitch angle, slip angle and camera pose, and the camera 102 is used for acquiring video camera stream and internal reference data and transmitting the video camera stream and internal reference data to the cloud server 200.

The cloud server 200 comprises an ORB-SLAM2201, a Unity editor 202 and a Web AR engine 203, wherein the ORB-SLAM2 is used for generating a point cloud map, the Unity editor 202 is used for generating an AR object, the WebAR engine 203 is used for controlling the display position of the AR object in a real space corresponding to the point cloud map, and then a WebAR applet can display the AR object in the display position.

Specifically, the method for using the large-space AR object recognition system is as follows: the user performs shooting of a large space using the VR applet of the data collector 100, and the shot camera stream is transmitted to the cloud server 200, and the camera stream is defined as { P1, P2, & gt, pn }. The cloud server 200 receives a frame of image P1, confirms the real large space position shot by the image P1 in real time, generates or updates a global point cloud map through the ORB-SLAM2201, then generates an AR object by using the Unity editor 202, finally displays the AR object in the camera 102 by using the WebAR engine 203, and provides AR display for the user.

Referring to fig. 2 in conjunction with the ORB-SLAM 2-based large-space AR object recognition system shown in fig. 1, fig. 2 is a flow chart of an ORB-SLAM 2-based large-space AR object recognition method according to an embodiment of the present application. As shown in fig. 2, the ORB-SLAM 2-based large-space AR object recognition method includes:

s110: and (3) extracting ORB features from the camera streams shot by the intelligent equipment, and constructing a global space point cloud map corresponding to the camera streams according to an ORB-SLAM2 technology by using the ORB features. The intelligent device (such as a mobile phone, a pad, a camera or intelligent wearable device) collects camera streams of surrounding real space, so that ORB features can be extracted from the camera streams, and a global space point cloud map is constructed according to an ORB-SLAM2 technology. The ORB-SLAM2 is a feature-based vision simultaneous localization and mapping (SLAM) system, and can be used for real-time localization and mapping in indoor or outdoor environments. The SLAM system aims to simultaneously estimate the motion of the camera and the three-dimensional structure of the scene through the sensor data acquired by the data acquisition unit without knowing map or positioning information of the environment in advance.

Specifically, as a preferred embodiment, as shown in fig. 3, this step S110: the method for constructing the global space point cloud map corresponding to the camera stream by using the ORB characteristics comprises the steps of:

s111: a plurality of image frames are extracted from the camera stream. Specifically, the intelligent equipment shoots a plurality of frames of images to obtain a plurality of image frames, and then ORB features are extracted from the image frames.

S112: according to the ORB feature extraction algorithm, ORB feature points are extracted from a plurality of image frames respectively.

Specifically, according to the ORB feature extraction algorithm, the steps of extracting ORB feature points from a plurality of image frames respectively are specifically as follows: 1. extracting key points based on a FAST algorithm; 2. screening key points with high matching scores based on Harris algorithm (searching points with relatively large changes in the x and y directions, wherein when the second derivative value of a certain point in the x and y directions is relatively large, the point can be regarded as a characteristic point); 3. pyramid transformation is carried out on the image; 4. calculating the center and the angle direction of the key point; 5. calculating binary descriptors of key points based on BRIEF algorithm; 6. the low-correlation pixel block is filtered based on a greedy algorithm. Through the above flow, the ORB feature extraction algorithm can not only extract ORB feature points, but also give feature descriptors of ORB feature points.

S113: from the ORB feature points, a key frame is determined from the plurality of image frames. Here pose detection needs to be performed on each frame of image before the key frame can be determined.

Specifically, as a preferred embodiment, as shown in fig. 4, this step S113: the step of determining a key frame from the plurality of image frames based on the ORB feature points comprises:

s1131: and judging whether the visual angle change amplitude between the current frame and the nearest key frame in the plurality of image frames is larger than or equal to a preset amplitude threshold value.

S1132: and if the visual angle change amplitude is greater than or equal to the preset amplitude threshold value, determining the current frame as a new key frame. The key frames can be determined here by means of view change detection. Specifically, it is determined that the viewing angle change algorithm between the current frame and the nearest key frame (the last determined key frame) exceeds a predetermined amplitude threshold M, and if so, the frame is determined to be a new key frame.

Or,

s1133: and judging whether the number of the matching points of the ORB characteristic points of the current frame and the ORB characteristic points of the nearest key frame in the plurality of image frames is smaller than or equal to a preset point threshold value.

S1134: and if the number of the matching points of the ORB feature points is smaller than or equal to a preset point threshold value, determining the current frame as a new key frame.

Here, the local map sparsity can be used to detect a key frame, specifically, when there is not a sufficient number of feature points around the current frame, or the number of ORB feature points that the current frame matches with the nearest key frame is less than or equal to a predetermined point threshold N, the system will select the current frame as a new key frame to ensure that there is enough feature information in the map to support positioning and mapping.

Also note that in order to avoid selecting too many key frames in a short time, the ORB-SLAM2 system typically considers the time interval between frames. If the time interval between the current frame and the nearest key frame is less than a certain threshold P, the system will suppress the current frame to be a new key frame to avoid too many redundant key frames.

In addition, loop detection is required to be carried out on the global space point cloud map corresponding to the camera flow so as to optimize the map and accurately position the map.

S1135: and performing loop detection on the global space point cloud map, and selecting an image frame in a preset distance range near the map position where the loop exists as a new key frame when the loop exists in the global space point cloud map.

According to the technical scheme provided by the embodiment of the application, when a loop (LoopClosure) is detected, the ORB-SLAM2 system can redefine the key frame. The loop detection can find a location that has been previously in the map, at which point the system may select frames near the location detected by the loop as new key frames to optimize the accuracy of the map and positioning.

The technical solution provided by the embodiment shown in fig. 3 further includes the following steps after determining a key frame from a plurality of image frames:

s114: and splicing the key frame point cloud data by using the key frame and the camera coordinates to obtain an initial spliced global point cloud map. After the key frame is determined, the pose of the camera and the internal parameters of the camera are required to be obtained, and the key frame point cloud data are preprocessed, wherein the preprocessing step comprises noise reduction, registration, coordinate conversion and the like, so that the point cloud map is spliced.

Specifically, converting the key frame point cloud data into a unified world coordinate system; extracting characteristic information in the point cloud data, such as edges, planes and the like, and using the characteristic information to perform point cloud map modeling; removing noise points or abnormal points in the point cloud through a filtering algorithm; registering the point cloud of the current frame with the existing map, and fusing new point cloud data into the map.

S115: and carrying out initial filtering on the initial spliced global point cloud map to obtain an initial filtering global point cloud map. After the initial splicing global point cloud map is completely filtered, the initial filtering global point cloud map can be obtained.

Specifically, after the stitching is completed, the initial stitched global point cloud map needs to be initially filtered, the voxel filtering is utilized to initially reduce noise of the initial stitched global point cloud map, and the average value of surrounding pixels (or voxels) is extracted to replace the value of the current pixel (or voxel) so as to smooth the image or data. The formula for mean filtering can be expressed as:

Wherein I is _smooth (x, y, z) is the smoothed pixel (or voxel) value; i (x, y, z) is the original pixel (or voxel) value; n is the number of pixels (or voxels) averaged; k is the radius of the filter.

S116: and updating the initial filtering global point cloud map by using a radius filtering algorithm to obtain a final global space point cloud map.

Specifically, as a preferred embodiment, as shown in fig. 5, the step of updating the initial filtered global point cloud map by using a radius filtering algorithm to obtain a final global space point cloud map includes:

s1161: using the fixed length radius, a neighbor point search range for each point is determined in the initial filtered global point cloud map. The method adopts a radius filtering algorithm to remove noise in the point cloud or to smooth the point cloud data. And specifically selecting a fixed length radius, and determining the range of searching neighbor points around each point.

S1162: and searching neighbor points around each point in the neighbor point searching range. Specifically, for each point in the point cloud data, a neighbor point is searched within its surrounding radius range centering on the point.

S1163: and for any neighbor point, acquiring the attribute value of the neighbor point, and replacing the attribute value of the neighbor point by using the calculated statistical attribute value. The attribute values here include the coordinates or other characteristics of the point; the statistical attribute value includes an average value or a median value of all surrounding neighbor points, and the like.

In the technical scheme provided by the embodiment of the application, the specific process of updating the initial filtering global point cloud map by using the radius filtering algorithm to obtain the final global space point cloud map comprises the following steps: (1) The basic flow of filtering with a radius filtering algorithm is to first select a fixed radius size for determining the range of searching for neighbor points around each point. (2) For each point in the point cloud, searching for neighbor points within its surrounding radius around the point as a center. (3) The attribute values of the neighbor points (coordinates or other features of the points) are obtained. (4) The filtering or smoothing effect is achieved by calculating the average value, the median value and the like of the neighbor points to replace the attribute value of the original point.

The method for identifying a large-space AR object according to the embodiment shown in fig. 2 further includes, after constructing a global space point cloud map corresponding to a camera stream:

s120: AR objects are generated using a Unity editor. The Unity editor is powerful software for editing an AR object that can be presented on a WeChat applet by generating an AR object and then fusing the AR object with real space.

Specifically, as a preferred embodiment, as shown in fig. 6, in the above-mentioned AR object recognition method, step S120: a step of generating an AR object using a Unity editor, comprising:

S121: in the view of the Unity editor, an instance of an AR object is created using AR shapes or AR models. And selecting a corresponding AR shape or a custom AR model from an editing view of the Unity editor, importing the AR model into the editing view, and dragging and dropping the AR model into a scene after importing the AR model to create an AR object instance.

S122: the appearance and position of the AR object instance in the view is adjusted. Specifically, an instance of an AR object created or imported is selected, and then the appearance and position of the AR object in the view is adjusted by modifying its position, rotation, and scaling.

S123: and processing the AR object instance by using the script of the Unity editor to generate a dynamic AR object. Specifically, the AR object can be dynamically generated using a script, and the 3D object can be created, destroyed, and manipulated by running the API function provided by the Unity editor. Similarly, changes in the appearance of an AR object can be changed by changing its application materials and textures, where the materials can be used to adjust the color, surface properties, etc. of the AR object. The AR object is selected in the view, then the required material is dragged and dropped into the corresponding slot, and the texture is also selected for adjustment.

S124: adding a collision body component to the AR object, and adjusting the physical properties of the AR object. When the AR object is required to participate in physical simulation and collision, a collision body component (Collider) can be added to the object, and physical properties of the object, such as mass, gravity and the like, can be adjusted according to the requirement.

S125: add Unity components to the AR object. To achieve more functions and effects, various Unity components, such as animation components, script components, illumination components, etc., may be added to the object, resulting in a finer AR object.

The large-space AR object recognition method provided in the embodiment shown in fig. 2 further includes, after the step of generating the AR object using the Unity editor:

s130: and aligning a virtual space coordinate system of the AR object and a point cloud map coordinate system of the global space point cloud map according to a space coordinate system alignment technology. The spatial coordinate system alignment technique means that in AR, a virtual object and a real-world object need to interact in the same coordinate system. This requires alignment of the coordinate system of the virtual space with the coordinate system of the real space. This is typically accomplished by a process called Registration (Registration) that uses various sensor data, such as camera images and IMU data, to estimate the relative position and orientation between the virtual camera and the real camera. This renders the virtual object in the correct position and orientation for interaction with the real world object.

Specifically, as a preferred embodiment, as shown in fig. 7, in the above-mentioned AR object recognition method, step S130: according to the space coordinate system alignment technique, the steps in aligning the virtual space coordinate system of the AR object and the point cloud map coordinate system of the global space point cloud map include:

S131: calculating an Euclidean transformation matrix between a virtual space coordinate system of the AR object and a point cloud map coordinate system of the global space point cloud map;

specifically, in three dimensions, the Euclidean transform is typically represented as a 4x4 matrix, called homogeneous transform matrix (Homogeneous Transformation Matrix), in the general form of

Where R represents the rotation matrix, t represents the translation vector, 0 is a zero vector of 1x3, and 1 is a scalar value of 1.

S132: and using the Euclidean transformation matrix to perform space coordinate system transformation on the global space point cloud map and the AR object.

The coordinates of the point P in the AR object in the virtual space coordinate system A are { x, y, z } ^T After applying the euclidean transformation matrix, the coordinates in the point cloud map coordinate system B of the global space point cloud map can be expressed by the following formula:

P _B ＝T×P _A

wherein P is _B Is the coordinates of a point P in a point cloud map space coordinate system B, P _A Is the coordinates of point P in the virtual space coordinate system a, and T is a transformation matrix describing the coordinate system from a to B.

S133: and identifying the AR object in the global space point cloud map according to the mapping information of the AR object in the global space point cloud map.

The embodiments of the present invention specifically identify and understand objects in a real environment based on mapping information (e.g., feature point information, including coordinates and movements of feature points, etc.) of AR objects in a global spatial point cloud map.

The method for identifying a large-space AR object according to the embodiment shown in fig. 2 further includes, after the step of aligning the virtual space coordinate system of the AR object with the point cloud map coordinate system of the global space point cloud map:

s140: and controlling the display position of the AR object in the real space corresponding to the camera stream by using a Web AR engine according to a point cloud map coordinate system of the global space point cloud map. In the embodiment of the present application, since the virtual space coordinate system of the AR object and the point cloud map coordinate system of the global space point cloud map are aligned, the AR object can be mapped into the real space captured by the camera stream corresponding to the global point cloud map, and the Web AR engine is specifically used to display the AR object in the display position of the real space corresponding to the camera stream.

Specifically, as a preferred embodiment, as shown in fig. 8, in the above-mentioned AR object recognition method, step S140: according to a point cloud map coordinate system of a global space point cloud map, a step of controlling a display position of an AR object in a camera stream corresponding to a real space by using a Web AR engine comprises the following steps:

s141: the Web AR engine acquires a plane or a position feature point of the acquired real space by using the global space point cloud map. The Web ARE engine can acquire a plane or a position feature point in real space by using the global space point cloud map, because the global space point cloud map has a point cloud map coordinate system corresponding to real space, the plane or the position feature point in real space can be ascertained through the point cloud map coordinate system, and then the plane or the position feature point is used as a reference point for AR object placement.

S142: the Web AR engine calculates and sets the display position and conversion information of the AR object by using the 3D transformation matrix according to the plane or the position feature point of the real space. After recognizing the plane or position feature point of the real space, the Web AR engine calculates the position where the AR object should be placed and conversion information including the position, rotation, and scaling of the AR object using the 3D transformation matrix.

S143: when the shot picture movement of the intelligent device is detected, the Web AR engine continuously tracks the plane or the position feature point of the real space by using the global space point cloud map. When the intelligent device moves, the shooting picture is converted, and the position and the coordinate corresponding to the global space point cloud map are converted, so that the Web AR engine can continuously track the plane or the position characteristic point in the real space by utilizing the global space point cloud map.

S144: the Web AR engine adjusts the position and conversion information of the AR object in the real space in real time according to the plane or the position feature points of the real space. The WebAR engine cannot directly identify the real space, so that the Web AR engine can continuously track the plane or position characteristic points of the real space through the global space point cloud map, and then adjust the position and conversion information of the AR object in real time according to the tracking result, so that the stability of the object and the consistency of the environment are maintained.

The following description is needed: the Web AR engine is a framework for integrating the functions of AR at the Web end. Web AR provides access and implementation to Web-based augmented reality through a Web browser using a combination of WebRTC, webGL, and modern sensor APIs. The main function of the WebAR engine is to fuse virtual digital content with real world scenes to create an enhanced visual experience. It perceives and understands the real world environment by using cameras, sensors and computer vision techniques and superimposes virtual content in the user's field of view in an appropriate manner.

The method for identifying the large-space AR object provided in the embodiment shown in fig. 2 further includes, after the step of controlling the display position of the AR object in the real space corresponding to the camera stream by using the Web AR engine:

s150: and displaying the AR object in the WeChat applet according to the display position of the AR object in the real space.

Specifically, as a preferred embodiment, as shown in fig. 9, the step of displaying the AR object in the WeChat applet according to the display position of the AR object in real space includes:

s151: using a Web AR engine to connect and send the display position and conversion information of the AR object in the real space to a cloud server;

S152: according to the user instruction and the page setting, the AR object is displayed from the cloud server to the real space displayed by the WeChat applet.

The WebAR engine can control the position of the AR object, so that the AR object is displayed on a picture shot by the WebAR applet.

In summary, according to the large-space AR object recognition method based on ORB-SLAM2 provided in the above technical solution, through extracting ORB features from a camera stream photographed by an intelligent device, then using the ORB features to construct a global point cloud map according to ORB-SLAM2 technology, the coordinate position of a real space photographed by the intelligent device can be recognized through the global point cloud map, then using a Unity editor to generate an AR object, and according to a space coordinate system alignment technology, aligning a virtual space coordinate system of the AR object and a point cloud map coordinate system of the global space point cloud map, then using the point cloud map coordinate system of the global space point cloud map to fuse the AR object into the real space corresponding to the camera stream, so that the Web engine is used to control the display position of the AR object in the real space corresponding to the camera stream, and the AR object can be displayed in the corresponding display position on a micro-communication applet. In summary, according to the technical scheme, the high-quality AR objects generated by the Unity editor are connected in the cloud by using the Web AR engine, so that the function of displaying the Unity generated AR objects in the WeChat applet is realized, and the diversity of the generated AR objects is improved. The web engineering is exported from Unity by using the WebAR engine development based on Unity and adding algorithm service, and the engineering is arranged on a server to generate a webpage to load the applet, so that the strategy can save a great deal of labor cost and time cost, and the capability of Unity application in the WeChat applet is realized; the problem that a great deal of manpower, material resources and period are consumed in developing an AR project on a WeChat applet in the prior art is solved.

In addition, referring to fig. 10, fig. 10 is a flow chart of a method for constructing a global space point cloud map according to an embodiment of the present application, where, as shown in fig. 10, the method for constructing a global space point cloud map includes:

s201: and the RGB camera at the intelligent equipment end acquires a camera flow image and sends the camera flow image to the cloud.

S202: sensor data and camera parameters are acquired.

S203: image frames are acquired and ORB feature extraction is performed.

S204: and carrying out pose prediction according to the single-frame image.

S205: determining whether the frame is a key frame; if yes, go to step S207; if not, step S206 is performed.

S206: and re-selecting a new key frame.

S207: and acquiring camera pose, camera internal parameters and current key frame point cloud data.

S208: the point cloud images are spliced and the point cloud map is updated.

S209: voxel filtering.

S210: and constructing a global point cloud map.

S211: radius filtering.

S212: and updating the map.

S213: and storing the map.

In addition, based on the same concept of the above method embodiment, the present application embodiment further provides an ORB-SLAM 2-based large-space AR object recognition system for implementing the above method of the present application, and since the principle of solving the problem of the system embodiment is similar to that of the method, the system embodiment has at least all the beneficial effects brought by the technical solution of the above embodiment, which is not described herein in detail.

In addition, as a preferred embodiment, as shown in fig. 11, the embodiment of the present application provides a large-space AR object recognition system based on ORB-SLAM2, including:

an ORB feature extraction module 110, configured to extract ORB features from a camera stream captured by the smart device;

the point cloud map construction module 120 is configured to construct a global spatial point cloud map corresponding to the camera stream according to an ORB-SLAM2 technique using the ORB feature;

an AR object generation module 130 for generating an AR object using a Unity editor;

the object map fusion module 140 is configured to align a virtual space coordinate system of the AR object and a point cloud map coordinate system of the global space point cloud map according to a space coordinate system alignment technique;

the display position setting module 150 is configured to control, according to a point cloud map coordinate system of the global space point cloud map, a display position of the AR object on the global space point cloud map using the WebAR engine;

the AR object display module 160 is configured to display the AR object in the WeChat applet according to a display position of the AR object in real space.

In summary, in the large-space AR object recognition system based on ORB-SLAM2 provided in the embodiments of the present application, ORB features are extracted from a camera stream photographed by an intelligent device through an ORB feature extraction module 110, then a point cloud map construction module 120 uses the ORB features to construct a global point cloud map according to ORB-SLAM2 technology, so that the AR object generation module 120 can recognize the coordinate position of a real space photographed by the intelligent device through the global point cloud map, then an AR object is generated by using a Unity editor, an object map fusion module 140 aligns the virtual space coordinate system of the AR object and the point cloud map coordinate system of the global space point cloud map according to a space coordinate system alignment technology, and then the AR object can be fused into a real space corresponding to the camera stream through the point cloud map coordinate system of the global space point cloud map, so that a display position setting module 150 uses a Web AR engine to control the display position of the AR object in the real space corresponding to the camera stream, and an AR object display module 160 can display the AR object in a corresponding display position on a WeChat applet. In summary, according to the technical scheme, the high-quality AR objects generated by the Unity editor are connected in the cloud by using the Web AR engine, so that the function of displaying the Unity generated AR objects in the WeChat applet is realized, and the diversity of the generated AR objects is improved. The Web AR engine based on Unity is used for development, the algorithm service is added, the Web engineering is derived from Unity, the engineering is arranged on a server to generate a webpage to be loaded to the applet, and the strategy can save a great deal of labor cost and time cost, so that the capability of Unity application in the WeChat applet is realized; the problem that a great deal of manpower, material resources and period are consumed in developing an AR project on a WeChat applet in the prior art is solved.

In addition, referring to fig. 12, fig. 12 is a schematic structural diagram of a large-space AR object recognition system based on ORB-SLAM2 according to an embodiment of the present application. As shown in fig. 12, the large-space AR object recognition system includes:

the steps of the ORB-SLAM2 based large space AR object recognition method provided by any of the embodiments described above are implemented when the ORB-SLAM2 based large space AR object recognition program is executed by a processor 1001, a communication bus 1002, a communication module 1003, a memory 1004, and the ORB-SLAM2 based large space AR object recognition program stored on the memory 1004 and running on the processor 1001.

According to the ORB-SLAM 2-based large-space AR object identification method, the ORB-SLAM2 algorithm is utilized to generate the point cloud map, so that the AR object presents more accurate effects, and the AR experience of a user is improved. In addition, the high-quality AR objects generated by the Unity are connected in the cloud by utilizing the Web AR engine, so that the function of displaying the Unity generated AR objects in the WeChat applet is realized, and the diversity of the generated AR objects is improved.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, and optical storage, etc.) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second and third, et cetera do not indicate any ordering. These words may be interpreted as names.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims

1. A large-space AR object recognition method based on ORB-SLAM2, comprising:

extracting ORB features from a camera stream shot by intelligent equipment, and constructing a global space point cloud map corresponding to the camera stream according to an ORB-SLAM2 technology by using the ORB features;

generating an AR object using a Unity editor;

according to a space coordinate system alignment technology, aligning a virtual space coordinate system of the AR object and a point cloud map coordinate system of the global space point cloud map;

according to a point cloud map coordinate system of the global space point cloud map, a Web AR engine is used for controlling the display position of the AR object in the real space corresponding to the camera flow;

And displaying the AR object in a WeChat applet according to the display position of the AR object in the real space.

2. The AR object recognition method according to claim 1, wherein the step of extracting an ORB feature from a camera stream photographed by an intelligent device, and constructing a global spatial point cloud map corresponding to the camera stream according to an ORB-SLAM2 technique using the ORB feature, comprises:

extracting a plurality of image frames from the camera stream;

respectively extracting ORB characteristic points from the plurality of image frames according to an ORB characteristic extraction algorithm;

determining a key frame from the plurality of image frames according to the ORB feature points;

splicing the key frame point cloud data by using the key frame and the camera coordinates to obtain an initial spliced global point cloud map;

3. The AR object recognition method according to claim 2, wherein the step of determining a key frame from the plurality of image frames according to the ORB feature points includes:

if the visual angle change amplitude is greater than or equal to the preset amplitude threshold, determining that the current frame is a new key frame;

or,

if the number of the matching points of the ORB feature points is smaller than or equal to a preset point threshold, determining that the current frame is a new key frame;

and performing loop detection on the global space point cloud map, and when the existence of the loop in the global space point cloud map is detected, selecting an image frame in a preset distance range near the map position where the loop exists as the new key frame.

4. The AR object identification method according to claim 2, wherein the step of updating the initially filtered global point cloud map using a radius filtering algorithm to obtain a final global spatial point cloud map comprises:

5. The AR object recognition method according to claim 1, wherein the generating an AR object using a Unity editor comprises:

adjusting the appearance and position of the AR object instance in the view;

processing the AR object instance by using the script of the Unity editor to generate a dynamic AR object;

and adding a Unity component to the AR object.

6. The AR object identification method according to claim 1, wherein the step of aligning the virtual space coordinate system of the AR object and the point cloud map coordinate system of the global space point cloud map according to the space coordinate system alignment technique comprises:

Using the Euclidean transformation matrix to perform space coordinate system transformation on the global space point cloud map and the AR object;

7. The AR object recognition method according to claim 1, wherein the step of controlling the display position of the AR object in the real space corresponding to the camera stream using a Web AR engine according to the point cloud map coordinate system of the global space point cloud map comprises:

the Web AR engine acquires and acquires a plane or a position feature point of the real space by using the global space point cloud map;

the Web AR engine calculates and sets the display position and conversion information of the AR object by using a 3D transformation matrix according to the plane or position feature points of the real space;

when the shot picture of the intelligent equipment is detected to move, the Web AR engine continuously tracks the plane or the position feature point of the real space by using the global space point cloud map;

and the Web AR engine adjusts the position and conversion information of the AR object in the real space in real time according to the plane or the position feature points of the real space.

8. The AR object recognition method according to claim 1, wherein the step of displaying the AR object in a mini-letter applet according to a display position of the AR object in the real space comprises:

using the Web AR engine to connect and send the display position and conversion information of the AR object in the real space to a cloud server;

and displaying the AR object from the cloud server to a real space displayed by the WeChat applet according to a user instruction and page setting.

9. A large-space AR object recognition system based on ORB-SLAM2, comprising:

the point cloud map construction module is used for constructing a global space point cloud map corresponding to the camera flow according to an ORB-SLAM2 technology by using the ORB characteristics;

the display position setting module is used for controlling the display position of the AR object on the global space point cloud map by using a WebAR engine according to the point cloud map coordinate system of the global space point cloud map;

10. A large-space AR object recognition system based on ORB-SLAM2, comprising:

memory, a processor and an ORB-SLAM2 based large space AR object recognition program stored on said memory and running on said processor, said ORB-SLAM2 based large space AR object recognition program when executed by said processor implementing the steps of the ORB-SLAM2 based large space AR object recognition method according to any of claims 1 to 8.