CN114494582A

CN114494582A - Three-dimensional model dynamic updating method based on visual perception

Info

Publication number: CN114494582A
Application number: CN202111664034.7A
Authority: CN
Inventors: 马威; 李清泉; 周宝定; 王冰; 朱华晨; 叶岑
Original assignee: Shenzhen University; Chongqing Jiaotong University
Current assignee: Shenzhen University; Chongqing Jiaotong University
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2022-05-13
Anticipated expiration: 2041-12-30
Also published as: CN114494582B

Abstract

The invention discloses a three-dimensional model dynamic updating method based on visual perception, which comprises the following steps: the method comprises the steps of obtaining a color image and an infrared image of a scene through RGB-I camera shooting; extracting two or more color images based on the color images, taking one of the images as a reference image, and back-projecting the projection points of other images on the model onto the reference image to obtain a plurality of pixel difference images; based on the pixel difference image, judging a model change region and reconstructing a local model, and simultaneously displaying the model change region and the reconstruction process in AR glasses; when the laser point is detected to exist in the reconstruction process through the infrared image, triangularization auxiliary model reconstruction is carried out on the laser point through the infrared image; and fusing the reconstructed model with the historical model to obtain a locally updated model. By means of the AR glasses with the binocular RGB-I camera and the laser pen, the local model is updated interactively, low-cost and efficient local three-dimensional model reconstruction is achieved, and the service cycle and the quality of the model are improved.

Description

Three-dimensional model dynamic updating method based on visual perception

Technical Field

The invention relates to the technical field of three-dimensional modeling, in particular to a three-dimensional model dynamic updating method based on visual perception.

Background

With the continuous development of communication technology, computer science and data acquisition sensing equipment, the application market scale taking position service as the core is rapidly increased. The three-dimensional model is used as basic data of position service, and is a key point for developing a real three-dimensional and digital twin and promoting the construction of a smart city. Compared with a traditional two-dimensional map, the three-dimensional model can provide more comprehensive geometric information and high-level visual experience, and meanwhile has rich scene semantics to enhance the cognition of a user on a position scene.

However, in the actual application process of the three-dimensional model, the environment changes with the passage of time, and the three-dimensional model needs to be updated continuously to maintain the uniformity with the scene. In the prior art, in order to realize local update, a three-dimensional scene is often reconstructed, which causes waste of existing data and artificial resources. Therefore, how to dynamically update and process the three-dimensional model and improve the service cycle and quality of the existing three-dimensional model is a current urgent need.

Therefore, the prior art still cannot realize efficient and low-cost update modeling for the local three-dimensional model so as to improve the service cycle and quality of the existing three-dimensional model.

Thus, there is still a need for improvement and development of the prior art.

Disclosure of Invention

The invention mainly aims to provide a three-dimensional model dynamic updating method based on visual perception, and aims to solve the problem that existing data and artificial resources are wasted due to the fact that a three-dimensional scene needs to be reconstructed in order to achieve local updating of a model in the prior art.

In order to achieve the above object, a first aspect of the present invention provides a method for dynamically updating a three-dimensional model based on visual perception, the method comprising:

shooting scene frame images through RGB-I cameras arranged on two sides of AR glasses, and dividing the scene frame images into a color image group and an infrared image group;

extracting two or more color images based on the color image group, taking one image as a reference image, and performing back projection on projection points of other images projected on a three-dimensional model onto the reference image to obtain pixel difference images of each image and the reference image;

based on the pixel difference image, when the three-dimensional model is judged to change, a change area of the three-dimensional model is obtained, a local model of the change area is reconstructed, and the reconstruction process of the three-dimensional model and the local model is displayed in AR glasses;

when the infrared image group detects that a laser point exists in the local model reconstruction process, positioning the laser point through an RGB-I camera, and performing triangularization auxiliary local model reconstruction;

and when the local model is detected to be completely reconstructed, fusing the reconstructed local model with the three-dimensional model to obtain a locally updated model.

Optionally, the step of capturing the scene frame image by the RGB-I cameras disposed at both sides of the AR glasses and dividing the scene frame image into the color image group and the infrared image group includes:

calibrating the camera internal parameter and the camera external parameter in advance;

the camera internal reference comprises focal point, principal point, radial distortion and tangential distortion of an RGB-I camera lens;

the camera external parameters include a geometric relationship between the two RGB-I cameras represented by a rotation matrix and a translation matrix.

capturing a scene frame image about a scene through the RGB-I camera shots disposed at both sides of the AR glasses according to the movement and posture change of the AR glasses around the scene;

and controlling the acquired scene frame image to be divided into a color image group and an infrared image group.

Optionally, the step of capturing the scene frame image by the RGB-I cameras disposed on both sides of the AR glasses and dividing the scene frame image into the color image group and the infrared image group includes:

and tracking and acquiring the camera position and posture of the RGB-I camera by combining visual positioning and GPS positioning data based on the color image group, wherein the camera position and posture is position and orientation data of the camera in the three-dimensional model coordinates.

Optionally, the step of extracting two or more color images based on the color image group, taking one of the images as a reference image, and obtaining a pixel difference image between each image and the reference image by back-projecting a projection point of the other image projected on the three-dimensional model onto the reference image includes:

extracting two or more color images based on the color image group, and taking one of the two or more color images as a reference image;

establishing a mapping relation between each extracted image and the three-dimensional model through pre-calibrated camera internal parameters and the camera position posture;

respectively calculating direction rays of other images except the reference image from a camera projection center pixel to the three-dimensional world, and obtaining the direction rays and a focus of the three-dimensional model, namely a projection point of an image point on the three-dimensional model;

respectively back projecting the projection points of the images on the three-dimensional model onto the reference image to obtain back projection images of the images and the reference image;

and respectively comparing each back projection image with the reference image to obtain a pixel difference image of each back projection image and the reference image.

Optionally, the step of obtaining a change region of the three-dimensional model when it is determined that the three-dimensional model changes based on the pixel difference image, reconstructing a local model of the change region, and displaying the reconstruction process of the three-dimensional model and the local model in the AR glasses includes:

when detecting that pixel values which are not equal to zero and/or are not close to zero exist in each pixel difference image, judging that the three-dimensional model changes;

confirming a changed region in each pixel difference image by correlating the regions in each pixel difference image;

respectively calculating the average position of each change area in each pixel difference image based on the change area in each pixel difference image, and distributing the average position in the pixel difference image in a covariance mode;

calculating to obtain a change area in the three-dimensional model according to the average position of each change area in each pixel difference image;

reconstructing a local model of a change region in a three-dimensional model based on the change region;

and simultaneously displaying the reconstruction process of the three-dimensional model and the local model in AR glasses.

Optionally, when a laser point is detected in the local model reconstruction process through the infrared image group, the step of positioning the laser point through the RGB-I camera and triangulating the auxiliary local model includes:

detecting that a laser point exists in the local model reconstruction process through the infrared image group;

tracking the laser points through two RGB-I cameras, and performing triangulation on the two-dimensional laser points in the infrared images respectively shot to obtain camera coordinates of the laser points;

and projecting the camera coordinates of the laser point into the coordinates of the three-dimensional model to assist in reconstructing the local model.

The invention provides a device for dynamically updating a three-dimensional model based on visual perception in a second aspect, wherein the device comprises:

the data acquisition module consists of two RGB-I cameras and is used for shooting and acquiring image data in real time and transmitting the image data to the data processing unit;

the data processing unit is used for calculating and processing the acquired image data in the background and transmitting the detection of the three-dimensional model change area and the updating process of the local model to the display module in real time;

the display module is used for displaying the three-dimensional model data and the local model updating process in real time and displaying the drawing area of the laser point;

the communication module is used for transmitting data among the data acquisition module, the data processing unit and the display module;

and the interaction module is composed of a laser with the spectrum of the emitted light at 680-730nm and is used for assisting the reconstruction of the local model.

The third aspect of the present invention provides an intelligent terminal, where the intelligent terminal includes a memory, a processor, and a three-dimensional model dynamic update program based on visual perception, stored in the memory and executable on the processor, and when the three-dimensional model dynamic update program based on visual perception is executed by the processor, the method for dynamically updating a three-dimensional model based on visual perception may be implemented.

A fourth aspect of the present invention provides a non-transitory computer readable storage medium having stored thereon a computer program, the computer program being executed by a processor to perform any of the steps of the method for dynamically updating a three-dimensional model based on visual perception.

As can be seen from the above, in the scheme of the present invention, scene frame images are captured by RGB-I cameras disposed at both sides of AR glasses, and the scene frame images are divided into a color image group and an infrared image group; extracting two or more color images based on the color image group, taking one image as a reference image, and performing back projection on projection points of other images projected on a three-dimensional model onto the reference image to obtain pixel difference images of each image and the reference image; based on the pixel difference image, when the three-dimensional model is judged to change, a change area of the three-dimensional model is obtained, a local model of the change area is reconstructed, and the reconstruction process of the three-dimensional model and the local model is displayed in AR glasses; when the infrared image group detects that a laser point exists in the local model reconstruction process, positioning the laser point through an RGB-I camera, and performing triangularization auxiliary local model reconstruction; and when the local model is detected to be completely reconstructed, fusing the reconstructed local model with the three-dimensional model to obtain a locally updated model. Compared with the prior art, the method aims at the sight line to the three-dimensional scene in the walking process through the AR glasses carrying the binocular RGB-I camera, the system can automatically detect whether the three-dimensional model changes, and the changed area is presented in the AR glasses. When the partial model of the changed area is reconstructed, whether the laser point exists in the shot picture can be detected, and the accuracy of the reconstructed model can be further improved according to the laser point. The method solves the problems of difficult updating, low efficiency and high cost of the local three-dimensional model, and realizes the reconstruction of the local three-dimensional model with low cost and high efficiency, thereby improving the service cycle and the quality of the existing three-dimensional model.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

FIG. 1 is a schematic flow chart of a method for dynamically updating a three-dimensional model based on visual perception according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a device for dynamically updating a three-dimensional model based on visual perception according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating the step S100 in FIG. 1 according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating a detailed process of step S200 in FIG. 1 according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating the step S300 in FIG. 1 according to an embodiment of the present invention;

FIG. 6 is a flowchart illustrating a detailed process of step S400 in FIG. 1 according to an embodiment of the present invention;

fig. 7 is a schematic block diagram of an internal structure of an intelligent terminal according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings of the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced otherwise than as specifically described and similarly intended by those of ordinary skill in the art without departing from the spirit of the present invention, which is not limited to the specific embodiments disclosed below.

However, over time, the environment changes and the three-dimensional models that have been modeled need to be continually updated to maintain uniformity with the scene. In the conventional method, in order to realize local updating, the conventional data and artificial resources are wasted by reconstructing a three-dimensional scene.

For example, in the prior art, the primary methods of three-dimensional reconstruction include (1) photogrammetric-based three-dimensional modeling; (2) three-dimensional reconstruction based on laser point cloud; (3) manual drawing based on CAD graphics. The above techniques and methods provide many alternatives for three-dimensional scene reconstruction. However, in the face of scene changes after modeling, means and equipment for updating the three-dimensional model are lacked. Therefore, local three-dimensional model updating is realized by three-dimensional reconstruction of a scene at the present stage, the model updating method is high in cost, and waste is caused to the existing data and artificial resources, so that the utilization rate and the service cycle of the model are low.

In order to solve the problems in the prior art, the invention provides a three-dimensional model dynamic updating method based on visual perception, in the embodiment of the invention, scene frame images are shot by RGB-I cameras arranged at two sides of AR glasses, and the scene frame images are divided into a color image group and an infrared image group; extracting two or more color images based on the color image group, taking one image as a reference image, and performing back projection on projection points of other images projected on a three-dimensional model onto the reference image to obtain pixel difference images of each image and the reference image; based on the pixel difference image, when the three-dimensional model is judged to change, a change area of the three-dimensional model is obtained, a local model of the change area is reconstructed, and the reconstruction process of the three-dimensional model and the local model is displayed in AR glasses; when the infrared image group detects that a laser point exists in the local model reconstruction process, positioning the laser point through an RGB-I camera, and performing triangularization auxiliary local model reconstruction; and when the local model is detected to be completely reconstructed, fusing the reconstructed local model with the three-dimensional model to obtain a locally updated model. Therefore, compared with the scheme that the local three-dimensional model is updated by three-dimensionally reconstructing the whole scene in the prior art, the method and the system automatically acquire the images in the three-dimensional scene aligned with the camera through the AR glasses carrying the binocular RGB-I camera, automatically calculate whether the three-dimensional model changes according to the images and display the changed areas in the AR glasses in real time. When the partial model of the changed area is reconstructed, whether the laser point exists in the shot picture can be detected, and the accuracy of the reconstructed model can be further improved according to the laser point. The method solves the problems of difficult updating, low efficiency and high cost of the local three-dimensional model, and realizes the reconstruction of the local three-dimensional model with low cost and high efficiency, thereby improving the service cycle and the quality of the existing three-dimensional model.

Exemplary method

As shown in fig. 1, an embodiment of the present invention provides a method for dynamically updating a three-dimensional model based on visual perception, where the method includes the following steps:

step S100, shooting a scene frame image through RGB-I cameras arranged on two sides of AR glasses, and dividing the scene frame image into a color image group and an infrared image group.

In this embodiment, the three-dimensional model dynamic updating system based on visual perception first captures frame images of a scene through RGB-I cameras preset on both sides of AR glasses, and divides all captured frame images into a color image group and an Infrared image group, where the RGB-I cameras are all called RGB-infra cameras, which can capture images of normal colors and Infrared images. By the steps, the color image of the scene is acquired for detecting whether the model changes or not, and the infrared image is acquired for detecting whether the laser point appears in the scene or not.

And S200, extracting two or more color images based on the color image group, taking one image as a reference image, and performing back projection on projection points of other images projected on the three-dimensional model onto the reference image to obtain pixel difference images of each image and the reference image.

In this embodiment, the system extracts two or more color images in the color image group based on the acquired color image group, and uses one of the images as a reference image to calculate and judge whether the model changes or not in cooperation with other color images. Further, the extracted color images are projected on a historical three-dimensional model, and projected points obtained by projection are back-projected on the reference image, so that pixel difference images of each color image and the reference image are obtained. It should be noted that, depending on the number of extracted color images, the processing time of the system on the color image data will be affected, and meanwhile, the greater the number of extracted color images, the better the judgment precision of the model change region will be. Therefore, the step can be used for acquiring the pixel difference image for judging whether the model changes or not through multiple angles.

And S300, based on the pixel difference image, acquiring a change area of the three-dimensional model when the three-dimensional model is judged to change, reconstructing a local model of the change area, and displaying the three-dimensional model and the reconstruction process of the local model in AR glasses.

In this embodiment, the system determines a pixel point in the pixel difference image, and determines that the three-dimensional model changes when there is a point whose pixel point value is not equal to zero or does not approach zero. And calculating to obtain a change area of the three-dimensional model according to the area which is displayed in the pixel difference image and has a change. Furthermore, when more than two color images are extracted in the step of extracting the color images, the change region parameters of the three-dimensional model with higher precision can be obtained according to a plurality of color images at multiple angles. And the system acquires a stereo image pair by prompting a user to observe around the three-dimensional model according to the change region of the three-dimensional model, so as to realize the reconstruction of the local model. Meanwhile, in the implementation process of the step, the system displays the historical three-dimensional model and the reconstruction process of the local model in the AR glasses in real time. Through the steps, the area of the three-dimensional model changing can be automatically calculated and obtained and reconstructed, meanwhile, the calculation analysis and reconstruction process is sent to the AR glasses in real time, the local model is interactively reconstructed, the model reconstruction is facilitated, and the real-time observation and modeling effect of a user is facilitated.

And S400, when the laser point exists in the local model reconstruction process detected through the infrared image group, positioning the laser point through an RGB-I camera, and performing triangularization auxiliary local model reconstruction.

In this embodiment, when a user performs detection and model reconstruction on a scene such as a white wall in a model change area by using the method, an incorrect parallax value may be generated, resulting in low accuracy of a reconstructed local model or occurrence of wrong modeling. When the system detects that a laser point irradiated by a user through a laser pen exists in the process of local model reconstruction through an infrared image shot by the RGB-I camera, the infrared image containing the laser point is subjected to triangulation through the dual-purpose RGB-I camera to obtain a parameter of the laser point in a model coordinate, the parameter is updated to the corresponding position of the local model, and the model precision is improved in a laser point assisted modeling mode. Further, this laser point can also be sent by the unmanned aerial vehicle who carries on binocular camera, is favorable to carrying out more meticulous local model to the scene that is difficult to reach and rebuilds. Therefore, in the step, the local model is updated through laser point interaction, so that convenient local model reconstruction can be realized, and better model precision and quality can be realized.

And S500, when the local model is detected to be completely reconstructed, fusing the reconstructed local model with the three-dimensional model to obtain a locally updated model.

In this embodiment, when it is detected that all the changed local model regions are completely reconstructed, the reconstructed local model is fused with the historical three-dimensional model, so as to obtain a locally updated model. The method realizes efficient, low-cost and more flexible local model reconstruction.

As can be seen from the above, the method for dynamically updating a three-dimensional model based on visual perception provided by the embodiment of the present invention captures a scene frame image through RGB-I cameras disposed on both sides of AR glasses, and divides the scene frame image into a color image group and an infrared image group; extracting two or more color images based on the color image group, taking one image as a reference image, and performing back projection on projection points of other images projected on a three-dimensional model onto the reference image to obtain pixel difference images of each image and the reference image; based on the pixel difference image, when the three-dimensional model is judged to change, a change area of the three-dimensional model is obtained, a local model of the change area is reconstructed, and the reconstruction process of the three-dimensional model and the local model is displayed in AR glasses; when the infrared image group detects that a laser point exists in the local model reconstruction process, positioning the laser point through an RGB-I camera, and performing triangularization auxiliary local model reconstruction; and when the local model is detected to be completely reconstructed, fusing the reconstructed local model with the three-dimensional model to obtain a locally updated model. Compared with the scheme of realizing local three-dimensional model updating by three-dimensional reconstruction of a scene in the prior art, the method and the system automatically acquire the image in the three-dimensional scene aligned with the camera through the AR glasses carrying the binocular RGB-I camera, automatically calculate whether the three-dimensional model changes or not according to the image and display the changed area in the AR glasses in real time. When the partial model of the changed area is reconstructed, whether the laser point exists in the shot picture can be detected, and the accuracy of the reconstructed model can be further improved according to the laser point. The method solves the problems of difficult updating, low efficiency and high cost of the local three-dimensional model, and realizes the reconstruction of the local three-dimensional model with low cost and high efficiency, thereby improving the service cycle and the quality of the existing three-dimensional model.

In a further embodiment, there is first provided an apparatus for dynamically updating a three-dimensional model based on visual perception, the apparatus comprising:

For example, please refer to fig. 2. The device in the embodiment is divided into two parts, namely AR glasses 10 and a laser pointer 20. Wherein, two sides of the AR glasses 10 are respectively provided with an RGB-I camera 101, i.e. the data acquisition module, with a baseline of 22 cm; the lens 102 of the AR glasses 10 is used for displaying three-dimensional model data and a local model updating process in real time, and displaying a drawing area of a laser spot, i.e., the display module; the frame of the AR glasses 10 includes a processor 103, which specifically includes the data processing unit 1031 and the communication module 1032. The laser pen 20 is the interaction module, and contains a laser capable of emitting 680-730nm light, so that the reconstruction of the local model is assisted by a user.

Furthermore, the device also comprises a key sensor which is used for controlling the start and the stop of the three-dimensional model dynamic updating function based on visual perception; the storage module is used for assisting in storing the image shot by the RGB-I camera and the operation data; and a power module for powering the device.

In a further embodiment, the scene is illustrated by taking a computer desk in a user room as an example, only one empty desk is arranged in an original computer desk model, and the computer desk in the scene in which the user performs model reconstruction contains objects such as a computer screen. When the scene of the computer desk is other scenes, the specific scheme in the embodiment can be referred to.

In an application scene, a three-dimensional model dynamic updating system based on visual perception shoots scene frame images through RGB-I cameras arranged on two sides of AR glasses, and the scene frame images are divided into a color image group and an infrared image group.

Specifically, as shown in fig. 3, the step S100 includes:

step S101, capturing scene frame images of a scene through the RGB-I cameras arranged on the two sides of the AR glasses according to the movement and posture change of the AR glasses around the scene;

and S102, controlling the acquired scene frame images to be divided into a color image group and an infrared image group.

Wherein, before the step of shooting the scene frame image by the RGB-I cameras arranged at the two sides of the AR glasses and dividing the scene frame image into a color image group and an infrared image group, the method comprises the following steps:

the camera intrinsic parameters comprise focal point, principal point, radial and tangential distortion of the RGB-I camera lens;

The steps of shooting the scene frame images through the RGB-I cameras arranged on the two sides of the AR glasses and dividing the scene frame images into a color image group and an infrared image group comprise the following steps:

For example, before a user uses the camera to shoot an image of a computer desk, internal parameters and external parameters of a binocular RGB-I camera carried on AR glasses need to be calibrated in advance for subsequently determining the position and posture of the camera in model coordinates. Wherein the camera internal parameters include focus, principal point, sagittal and tangential distortions of each RGB-I camera lens, and the camera external parameters include a geometric relationship between the two RGB-I cameras represented using a rotation matrix and a translation matrix.

After the internal reference and the external reference of the camera are calibrated, for example, a user walks to the right of a computer desk from the left side of the computer desk around anticlockwise, and controls the visual angles of the AR glasses to point to the direction of the computer desk in the walking process, the system controls the RGB-I camera to shoot and capture a group of scene frame images for observing the computer desk from multiple visual angles, and meanwhile, the scene frame images are divided into a color image group and an infrared image group.

Further, a camera position and orientation, which is data of a position and orientation when the camera takes a certain image, is estimated using the color image. Firstly, input image pairs are preprocessed, a Gaussian filter is utilized to carry out spatial smoothing on the images, and a three-layer image pyramid is constructed. Then, a set of sparse local feature points, such as computer desk edges, are extracted using a FAST detector, and the initial rotation of the image plane for three-dimensional motion is processed by feature matching estimation.

Furthermore, in order to improve the operation efficiency, the visual feature matching stage is limited in a local search window, for example, the window only contains one corner of the computer display screen frame, and then the RANSAC algorithm is used for estimating the transformation of the position and the posture of the camera. When the visual angle of the camera is not changed significantly, for example, the position is still and the visual angle changes by less than 1 °, in order to avoid the camera position and posture from shaking or drifting, the control does not calculate each image, and only the key frame image with the larger visual angle change is input into the RANSAC algorithm to estimate the transformation of the camera position and posture. And then obtaining the relative posture of the RGB-I camera when shooting the group of images, or the relative posture is called as a visual positioning result.

Furthermore, after a visual positioning result is obtained, GPS positioning data is fused to ensure positioning accuracy, and meanwhile, the visual positioning result is associated with the model coordinate. And a fusion positioning result is obtained by adopting a visual/GPS tight coupling combined navigation mode, and the positioning error is reduced by a reverse smoothing algorithm.

The GPS ranging data is used as the measuring data of a Kalman filter, the visual positioning data is used as the state prediction data of the Kalman filter, and the GPS ranging data and the visual positioning data are led into the filter to realize the combined navigation of the GPS/visual sensor according to the absolute precision of a GPS data control object and the relative precision of the visual positioning data control object.

Furthermore, after a group of GPS data is obtained and fusion positioning is completed through a Kalman filter, RTS (reactive set temperature) reverse smoothing algorithm is used for improving positioning accuracy and robustness.

Where k is N-1, N-2, …,0, N is the total number of observed information, a_kIn order to smooth out the gain matrix,

is the positioning result after filtering.

Therefore, the relative posture is converted into the camera position posture determined in the model coordinate by fusing the GPS positioning data, and the camera position posture corresponding to the color image one by one is determined by the shot color image and the GPS positioning data.

In one application scenario, the system extracts two or more color images based on the color image group, uses one image as a reference image, and obtains a pixel difference image between each image and the reference image by back-projecting the projection points of the other images projected on the three-dimensional model onto the reference image.

Specifically, as shown in fig. 4, the step S200 includes:

step S201, extracting two or more color images based on the color image group, and taking one of the two or more color images as a reference image;

s202, establishing a mapping relation between each extracted image and a three-dimensional model through pre-calibrated camera internal parameters and the position and the posture of the camera;

step S203, respectively calculating direction rays of other images except the reference image from a camera projection center pixel to the three-dimensional world, and obtaining the direction rays and a focus of the three-dimensional model, namely projection points of image points on the three-dimensional model;

step S204, respectively back projecting the projection points of the images on the three-dimensional model onto the reference image to obtain back projection images of the images and the reference image;

step S205 is to compare each of the back projection images with the reference image to obtain a pixel difference image between each of the back projection images and the reference image.

For example, in this embodiment, six color images with computer display screens placed on the computer desk are obtained from the color image group, and change detection is performed on the three-dimensional model of the computer desk.

Firstly, establishing a mapping relation from a color image to a three-dimensional model according to a pre-calibrated camera internal reference and a camera position posture calculated by the color image and GPS positioning data:

x＝PX (2)

wherein X is a two-dimensional image point, X is a three-dimensional model coordinate, and P is a projection matrix from an image to a model.

P＝K[R|-Rt] (3)

And K is camera internal reference, R and t are translation matrix and rotation matrix from world coordinate to camera coordinate, namely the camera external reference.

Furthermore, five images are selected from the six color images, and the remaining one image is used as a reference image. Calculating a directional ray r of the image from the camera projection center pixel to the three-dimensional world:

r＝R^TK^-1x (4)

according to the formula, the direction ray r and the focus X of the three-dimensional model can be solved^*I.e. the projected position of the image point on the three-dimensional model. Then in this step, when a two-dimensional image containing a computer display screen is projected onto the model, since there is only a table in the modelWithout a computer display screen, the situation of sticking a display screen picture on a model desk can be presented.

Further, projecting points of the selected five images on the three-dimensional model are back projected on the reference image:

x_n→6＝P₆X (5)

wherein n 1.. 5, P₆As an image I₆Projection matrix to the three-dimensional model, the back projection image being denoted as I_n→6。

Further, each of the back projection images I is compared and saved_n→6And I₆Pixel difference D between_t(I, j), calculating I₆Each pixel of the image and I_n→6Minimum euclidean norm of pixels near the image projection point:

wherein S (t) is n adjacent key frame images, D_s→t(i, j) is defined as:

where i, j, k, l are pixel coordinates, N_i，jFor the pixel neighborhood size, the calculation is performed by propagating the pose uncertainty when acquiring the image to the image point:

and further obtaining pixel difference images of each back projection image and the reference image.

In an application scene, the system acquires a change region of the three-dimensional model when the three-dimensional model is judged to be changed based on the pixel difference image, reconstructs a local model of the change region, and displays the reconstruction process of the three-dimensional model and the local model in AR glasses.

Specifically, as shown in fig. 5, the step S300 includes:

step S301, when detecting that pixel values which are not equal to zero and/or are not close to zero exist in each pixel difference image, judging that the three-dimensional model changes;

step S302, associating the areas in each pixel difference image, and confirming the changed areas in each pixel difference image;

step S303, respectively calculating the average position of each change area in each pixel difference image based on the change area in each pixel difference image, and distributing the average positions in the pixel difference image in a covariance mode;

step S304, calculating to obtain a change area in the three-dimensional model according to the average position of each change area in each pixel difference image;

step S305, reconstructing a local model of a change region based on the change region in the three-dimensional model;

and S306, displaying the reconstruction process of the three-dimensional model and the local model in AR glasses.

For example, the system may calculate each pixel difference D based on the calculated pixel difference image_t(i, j) equals zero or approaches zero, indicating that no change has occurred in the three-dimensional model. Otherwise, the change of the three-dimensional model is detected.

Further, the areas in each pixel difference image are associated to confirm the changed areas in each pixel difference image, for example, if the outline of the left view of the computer display screen is seen as a changed area in the first image shot from the first angle and the outline of the right view of the computer display screen is seen as a changed area in the second image shot from the second angle, the changed areas of the computer display screen in the two images are considered as the same object by associating the first image with the second image.

The method specifically comprises the following steps: firstly, a pixel difference image is filtered to remove noise by using a corrosion and expansion algorithm, and then an object contour boundary in a single pixel difference image, such as a computer display screen contour boundary, is extracted by using a boundary tracking algorithm. Further, in order to remove noise caused by dust and the like, all regions of the contour having a pixel range smaller than a threshold value, which may be a manually input and adjusted value, are deleted. And then calculating and comparing the hue saturation histograms of all the areas in the images, and using epipolar lines to perform geometric consistency check to associate the areas in different pixel difference images, for example, associating two outlines in the first image and the second image into the same object, namely a computer display screen.

Further, for an image, the average position of one of the regions determined to be changed is first calculated

And is distributed in the image in the form of a covariance e t. Then, the average position of each change region in the image is determined

Triangulating a three-dimensional point to obtain the coordinates of the model

Where A is a 3n × 4 matrix:

wherein, P_tAs an image I_tThe projection matrix to the model is then,

is composed of

Is determined by the skew-symmetric matrix of (a).

Further, for each region of variation in an image, the calculation corresponds to

K sigma points of sum Σ t

And the sigma points are projected into the model space to estimate all the variation regions of the three-dimensional model through one image. For example, the image not only detects that a computer display screen, a bottle of water and an ashtray are arranged on the computer desk, and a plurality of change areas can be obtained through the step.

Further, the above steps are repeated, and the mean value of each corresponding change area in each image is sequentially obtained

And sigma points of the covariance matrix sigma t are projected into a model space so as to estimate a change area of the three-dimensional model with higher precision.

Then in the above steps, the system will estimate the changed area whose shape is the computer display screen.

Further, based on the change area of the three-dimensional model, a user surrounds a local model to be reconstructed to obtain a group of stereo image pairs, and the depth is calculated through the following formula:

z_i＝bf/d_i (13)

wherein z is_iFor the corresponding i-th pixel disparity value d_iB is the stereo camera baseline, and f is the camera focal length. Based on the calculated depth, a voxel Hash surface reconstruction method is utilized to display a computer screenAnd performing three-dimensional reconstruction.

Meanwhile, the system displays the detected model change area of the computer display screen appearance and the reconstruction process of the computer display screen model in the AR glasses in real time for the user to check, so that the user can check the reconstruction effect and the model quality in real time, the use by the user is facilitated, and the local model reconstruction efficiency is improved.

In an application scenario, when the system detects that a laser point exists in a local model reconstruction process through the infrared image group, the laser point is positioned through an RGB-I camera, and triangulation auxiliary local model reconstruction is performed.

Specifically, as shown in fig. 6, the step S400 includes:

s401, detecting that a laser point exists in the local model reconstruction process through the infrared image group;

s402, tracking the laser points through two RGB-I cameras, and performing triangulation on two-dimensional laser points in the infrared images respectively shot to obtain camera coordinates of the laser points;

and S403, projecting the camera coordinates of the laser point into the coordinates of the three-dimensional model to assist in reconstructing the local model.

For example, when a user finds through the AR glasses that an error occurs in a three-dimensional model change area detection or local model reconstruction process, for example, when a white wall is detected, a wall is hollowed due to the fact that a parallax value cannot be determined, the user irradiates the wall through a laser pen at the moment, and the AR glasses view angle tracks the laser pen. And at the moment, the RGB-I camera detects the infrared laser point through the infrared image, the system controls two infrared images containing the laser point and shot by the binocular RGB-I camera, and the laser point in the images is projected into the model coordinate through triangulation, so that the reconstructed local model is modeled with higher precision.

Furthermore, in order to ensure that the laser points can be effectively utilized to carry out high-precision modeling, the following steps are added. First, a tracker, referred to as a binocular RGB-I camera, is initialized by pointing a laser pointer at a predetermined rectangle in the center of the image. And estimating the pointing position of the laser pen by using the camera, and predicting the camera attitude of a local window in a t +1 frame based on Kalman filtering. And moving the local window to the predicted position, and performing threshold processing on the infrared image containing the laser point. The threshold processing specifically includes: if the average intensity of a window pixel is far beyond the threshold range, or the above-threshold pixel is not connected, or the number of the above-threshold pixels is far higher than the size of an expected laser point, the tracker automatically switches to the're-detection', namely when the laser point is detected to be abnormal in the process of tracking the laser point through continuous infrared images, the infrared image of the abnormal laser point is considered to be inaccurate data, if the continuous detection can cause the increase of modeling errors, the position needs to be detected again. By the steps, the laser points are ensured to be always in the camera tracking range, and each group of infrared images used for reconstructing the model are effective.

Further, the predicted value of the kalman filter is corrected with the measured camera attitude. If the laser tracker is in a re-detection mode, the laser tracker is re-initialized in a small area close to the last known position, namely, the tracking is restarted at the position where the laser spot is judged to be abnormal, so that the situation that the tracking is continuously started from the beginning when the tracking of the laser spot is abnormal is avoided, and the model reconstruction efficiency is improved.

Therefore, by using the method that the user uses the AR glasses to detect the change area of the three-dimensional model and update the local model in real time and uses the laser pen to perform high-precision modeling on the local model, the interactive method for updating the local model is convenient for the user to use and can obtain better model quality.

In an application scenario, when the system detects that the local model is completely reconstructed, the reconstructed local model and the three-dimensional model are fused to obtain a locally updated model.

For example, after the local model that changes by the user surrounding the computer desk or by the laser pen, i.e., the computer is modeled again, the local model containing the computer display screen is fused with the original computer desk model to obtain the updated three-dimensional model of the computer desk.

Exemplary device

Based on the above embodiments, the present invention provides an intelligent terminal, and a schematic block diagram thereof may be as shown in fig. 7. The intelligent terminal comprises a processor, a memory, a network interface and a display screen which are connected through a system bus. Wherein, the processor of the intelligent terminal is used for providing calculation and control capability. The memory of the intelligent terminal comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the intelligent terminal is used for being connected and communicated with an external terminal through a network. The computer program, when executed by a processor, performs the steps of any of the above-described methods for dynamically updating a three-dimensional model based on visual perception. The display screen of the intelligent terminal can be a liquid crystal display screen or an electronic ink display screen.

It will be understood by those skilled in the art that the block diagram of fig. 7 is only a block diagram of a part of the structure related to the solution of the present invention, and does not constitute a limitation to the intelligent terminal to which the solution of the present invention is applied, and a specific intelligent terminal may include more or less components than those shown in the figure, or combine some components, or have different arrangements of components.

In one embodiment, an intelligent terminal is provided, which includes a memory, a processor, and a program stored in the memory and executable on the processor, and when executed by the processor, the program performs the following operations:

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned functions may be distributed as different functional units and modules according to needs, that is, the internal structure of the apparatus may be divided into different functional units or modules to implement all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described or recited in any embodiment. The same amount of expression is expressed in each formula, and the same amount is expressed in each formula unless otherwise specified, and the formulas may be referred to each other.

Those of ordinary skill in the art will appreciate that the elements and method steps of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the above modules or units is only one logical division, and the actual implementation may be implemented by another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

The integrated modules/units described above, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method according to the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium and used by a processor to implement the steps of the above embodiments of the method. The computer program includes computer program code, and the computer program code may be in a source code form, an object code form, an executable file or some intermediate form. The computer readable medium may include: any entity or device capable of carrying the above-mentioned computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signal, telecommunication signal, software distribution medium, etc. It should be noted that the contents contained in the computer-readable storage medium can be increased or decreased as required by legislation and patent practice in the jurisdiction.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art; the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein.

Claims

1. A dynamic update method for a three-dimensional model based on visual perception is characterized by comprising the following steps:

2. The method of claim 1, wherein the step of capturing the scene frame images by the RGB-I cameras disposed on both sides of the AR glasses and dividing the scene frame images into the color image group and the infrared image group comprises:

3. The method of claim 1, wherein the step of capturing the scene frame images by the RGB-I cameras disposed at both sides of the AR glasses and dividing the scene frame images into the color image group and the infrared image group comprises:

4. The method of claim 1, wherein the step of capturing the scene frame images by the RGB-I cameras disposed at both sides of the AR glasses and dividing the scene frame images into the color image group and the infrared image group is followed by the steps of:

5. The method of claim 4, wherein the step of extracting two or more color images based on the color image group, using one of the images as a reference image, and back-projecting a projection point of the other image projected on the three-dimensional model onto the reference image to obtain a pixel difference image between each image and the reference image comprises:

respectively carrying out back projection on projection points of the images on the three-dimensional model to the reference image to obtain back projection images of the images and the reference image;

6. The method for dynamically updating the three-dimensional model based on visual perception according to claim 1, wherein the step of obtaining a change region of the three-dimensional model when the three-dimensional model is judged to be changed based on the pixel difference image, reconstructing a local model of the change region, and displaying the reconstruction process of the three-dimensional model and the local model in AR glasses comprises:

7. The method as claimed in claim 1, wherein the step of locating the laser point by the RGB-I camera and triangulating the auxiliary local model when the laser point is detected to exist in the local model reconstruction process by the group of infrared images comprises:

8. An apparatus for dynamically updating a three-dimensional model based on visual perception, the apparatus comprising:

9. An intelligent terminal, characterized in that the intelligent terminal comprises a memory, a processor and a dynamic update program based on visual perception stored on the memory and operable on the processor, wherein the dynamic update program based on visual perception three-dimensional model realizes the steps of the dynamic update method based on visual perception according to any one of claims 1 to 7 when executed by the processor.

10. A non-transitory computer readable storage medium having stored thereon a computer program, the computer program being executed by a processor to perform the steps of implementing the method for dynamic update of a three-dimensional model based on visual perception according to any one of claims 1 to 7.