CN111402412A

CN111402412A - Data acquisition method and device, equipment and storage medium

Info

Publication number: CN111402412A
Application number: CN202010299543.3A
Authority: CN
Inventors: 周庭竹
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-04-16
Filing date: 2020-04-16
Publication date: 2020-07-10
Anticipated expiration: 2040-04-16
Also published as: CN111402412B

Abstract

The embodiment of the application discloses a data acquisition method, a data acquisition device, equipment and a storage medium, wherein the method comprises the following steps: constructing a scene model corresponding to a local space according to a current key frame acquired by a camera and a plurality of target associated frames of the current key frame; the target associated frame is a key frame matched with the current key frame on the image characteristics; displaying a scene model of the local space to feedback to a user a data acquisition quality of the camera in the local space; controlling the camera to perform data acquisition on a specific area of the local space again so as to improve the data acquisition quality in the local space and display a reconstructed scene model of the local space; or controlling the camera to acquire data of the next local space.

Description

Data acquisition method and device, equipment and storage medium

Technical Field

The embodiment of the application relates to a computer vision technology, and relates to a data acquisition method, a data acquisition device, data acquisition equipment and a storage medium.

Background

The image-based three-dimensional reconstruction technology is an important branch of the field of computer vision research, and has wide application prospects in the fields of three-dimensional mapping, augmented reality, virtual reality, games, medical treatment, animation production, computer-aided design and the like. The technology collects multi-frame images in a physical space through a camera, analyzes and processes the images, deduces three-dimensional information of an object in the physical space by combining computer vision knowledge, and reconstructs a three-dimensional model of the object in a spatial scene.

In the three-dimensional reconstruction technology based on multi-frame images, the quality of data acquisition directly influences the quality of a final reconstruction model.

Disclosure of Invention

The embodiment of the application provides a data acquisition method, a data acquisition device, equipment and a storage medium.

The technical scheme of the embodiment of the application is realized as follows:

in a first aspect, an embodiment of the present application provides a data acquisition method, where the method includes: constructing a scene model corresponding to a local space according to a current key frame acquired by a camera and a plurality of target associated frames of the current key frame; the target associated frame is a key frame matched with the current key frame on the image characteristics; displaying a scene model of the local space to feedback to a user a data acquisition quality of the camera in the local space; controlling the camera to perform data acquisition on a specific area of the local space again so as to improve the data acquisition quality in the local space and display a reconstructed scene model of the local space; or controlling the camera to acquire data of the next local space.

In a second aspect, an embodiment of the present application provides a data acquisition apparatus, including: the local model building module is used for building a scene model corresponding to a local space according to a current key frame acquired by a camera and a plurality of target associated frames of the current key frame; the target associated frame is a key frame matched with the current key frame on the image characteristics; the display module is used for displaying the scene model of the local space so as to feed back the data acquisition quality of the camera in the local space to a user; the data acquisition module is used for controlling the camera to perform data acquisition on a specific area of the local space again so as to improve the data acquisition quality in the local space, and displaying the reconstructed scene model of the local space through the display module; or controlling the camera to acquire data of the next local space.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory and a processor, where the memory stores a computer program that is executable on the processor, and the processor executes the computer program to implement steps in any data acquisition method according to the embodiment of the present application.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the data acquisition method according to any one of the embodiments of the present application.

In the embodiment of the application, the data acquisition stage of three-dimensional reconstruction is improved, and the corresponding scene model is displayed while data are acquired based on the currently acquired key frame and the historical key frame matched with the currently acquired key frame, so that the data acquisition quality can be fed back to a user in real time to guide the user to perform data acquisition again on the region which does not meet the requirements; therefore, the data acquisition quality can be improved, and the model quality of later-stage three-dimensional reconstruction is improved.

Drawings

FIG. 1 is a schematic diagram of a flow chart of an implementation of a data acquisition method according to an embodiment of the present application;

FIG. 2A is a schematic diagram of a flow chart of another data acquisition method according to an embodiment of the present disclosure;

fig. 2B is a schematic diagram of an implementation flow of a target associated frame determination method according to an embodiment of the present application;

FIG. 3 is a diagram illustrating a relationship chain of feature point matching according to an embodiment of the present disclosure;

fig. 4A is a schematic flow chart illustrating an implementation of another data acquisition method according to an embodiment of the present application;

FIG. 4B is a schematic diagram of an implementation flow of a re-projection error function construction method according to an embodiment of the present application;

FIG. 4C is a schematic diagram of an implementation flow of a target geometric model reconstruction method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a flow chart of another data acquisition method according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a flow chart of another data acquisition method according to an embodiment of the present application;

fig. 7 is a schematic flowchart of an implementation of an offline reconstruction method according to an embodiment of the present application;

FIG. 8A is a schematic structural diagram of a data acquisition device according to an embodiment of the present application;

FIG. 8B is a schematic structural diagram of another data acquisition device according to an embodiment of the present application;

FIG. 8C is a schematic structural diagram of another data acquisition device according to an embodiment of the present application;

fig. 9 is a hardware entity diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, specific technical solutions of the present application will be described in further detail below with reference to the accompanying drawings in the embodiments of the present application. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

It should be noted that the terms "first \ second \ third" referred to in the embodiments of the present application merely distinguish similar or different objects and do not represent a specific ordering with respect to the objects, and it should be understood that "first \ second \ third" may be interchanged under certain ordering or sequence circumstances to enable the embodiments of the present application described herein to be implemented in other orders than illustrated or described herein.

The embodiment of the application provides a data acquisition method, which can be applied to electronic equipment, wherein the electronic equipment can be equipment carrying a camera, such as a mobile terminal (e.g., a mobile phone, a tablet computer, an electronic reader and the like), a notebook computer, a desktop computer, a robot, an unmanned aerial vehicle, a server, an augmented reality helmet and the like. The functions implemented by the data acquisition method can be implemented by calling program codes through a processor in the electronic device, and the program codes can be stored in a computer storage medium.

Fig. 1 is a schematic flow chart of an implementation of a data acquisition method according to an embodiment of the present application, and as shown in fig. 1, the method at least includes the following steps 101 to 103:

step 101, constructing a scene model corresponding to a local space according to a current key frame acquired by a camera and a plurality of target associated frames of the current key frame; and the target associated frame is a key frame matched with the current key frame in image characteristics.

It should be noted that the electronic device with the camera can move along a specified path in the global space and capture an image, and the capture mode may be a photographing mode or a video shooting mode. The camera is typically an integrated (Red Green Blue, RGB) camera and depth camera device. The electronic device can build a scene model according to the current key frame and the plurality of target associated frames, and can also build a scene model according to the key frame, the plurality of target associated frames and the respective corresponding depth images. The scene model may be various, for example, the scene model is a point cloud model, a voxel model without a texture map, or a voxel model with a texture map, etc.

In implementation, each time a key frame is collected, the electronic device may determine a feature block included in the key frame, and extract feature information of the feature block, where the feature information may be, for example, a feature descriptor or other image feature information that can uniquely represent the key frame; then, calculating the characteristic information through a specific hash function to obtain an inquiry hash value; if the query hash value is not matched with any hash index item in the hash index table, generating a new hash index item in the hash index table according to the query hash value; and if the query hash value is matched with a certain hash index item in the hash index table, acquiring a plurality of target associated frames of the key frame according to the linked list where the hash index item is located. That is, the electronic device may update the hash index table according to the currently acquired key frame while acquiring the key frame until the data acquisition work of the global space is completed.

In some embodiments, the data pointed to by each hash index entry in the hash index table may include: the method comprises the steps of key frames, depth images corresponding to the key frames and camera poses of the key frames.

And 102, displaying the scene model of the local space so as to feed back the data acquisition quality of the camera in the local space to a user.

It can be understood that, through the displayed scene model, the user can clearly determine where the acquired image quality in the corresponding local space is not good and where the acquisition is missed, so as to control the electronic device to perform image acquisition on these areas again, thereby improving the data acquisition quality in the local space.

In order to complete reconstruction of a scene model quickly to realize real-time display and enable a user to know current data acquisition quality in time, in some embodiments, a plurality of target associated frames of the current key frame may be historical key frames within a certain time length from the current key frame; or a specific number of historical key frames closest to the acquisition time of the current key frame, for example, the 10 latest historical key frames.

103, controlling the camera to perform data acquisition on a specific area of the local space again so as to improve the data acquisition quality in the local space and displaying a reconstructed scene model of the local space; or controlling the camera to acquire data of the next local space.

It will be appreciated that whether the data acquisition is performed again for a specific region or for the next local space, it is actually necessary to return to step 101 to acquire a new key frame and construct a new scene model based on the key frame and the target associated frame of the frame, so as to feedback the quality of the newly acquired data to the user.

In implementation, the electronic device may control the camera to shoot the area specified by the user again according to a shooting instruction input by the user, so as to reconstruct the scene model of the local space according to the newly shot key frame and the target associated frame matched with the key frame, thereby feeding back the quality of the data shot again to the user. Of course, in another embodiment, the electronic device may also control the camera to scan the specific area again or scan the next local space according to the scan instruction input by the user. The scan instructions may include location information indicating an area to be scanned.

Three-dimensional reconstruction is performed on a global space, a large amount of key frame data are often required to be collected, and particularly, massive key frame data are required for the global space of a large scene, such as an airport, a market, a city and the like. If the electronic equipment only performs silent acquisition all the time, the user cannot know the acquired data quality is poor and the acquisition is missed, so that the better data acquisition quality cannot be ensured, and the subsequent model reconstruction quality of the global space cannot be ensured.

Based on this, in the embodiment of the application, the data acquisition stage of the three-dimensional reconstruction is improved, and the corresponding scene model is displayed while data is acquired based on the currently acquired key frame and the historical key frame matched with the currently acquired key frame, so that the data acquisition quality can be fed back to the user in real time to guide the user to perform data acquisition again on the region which does not meet the requirements; therefore, the data acquisition quality can be improved, and the model quality of later-stage three-dimensional reconstruction is improved.

An embodiment of the present application provides a data acquisition method, and fig. 2 is a schematic flow chart illustrating an implementation of the data acquisition method according to the embodiment of the present application, and as shown in fig. 2, the method may include the following steps 201 to 213:

step 201, collecting a current key frame in a global space through a camera.

It should be noted that the global space described in the embodiment of the present application is relative to the local space, and the local space is a partial space of the global space.

Step 202, extracting the feature information of the feature block in the current key frame.

It is understood that the feature block is a representative image area in the current key frame. The feature information of the feature block may be a feature descriptor of the block, or may be other image features capable of indicating uniqueness of the feature block.

Step 203, processing the characteristic information by using a hash function to obtain a query hash value;

step 204, inquiring a target index item matched with the inquired hash value from the currently constructed hash index table; if the target index item is not queried, go to step 205; if the target index item is queried, go to step 206;

step 205, establishing a new linked list according to the query hash value of the current key frame to update the hash index table, and then returning to execute step 201.

It can be understood that, if the electronic device does not query the target index entry from the hash index table, it indicates that there is no target associated frame matching the image feature of the current key frame currently, and at this time, a new linked list may be generated according to the query hash value; the hash index entry is used for pointing to the current key frame, the corresponding camera pose, the acquisition time and the storage address of the corresponding depth image.

It should be noted that, after the new linked list is established, the electronic device may return to perform step 201; the electronic device may also perform step 201 and step 205 in parallel when the target index item is not queried.

Step 206, acquiring a plurality of target associated frames of the current key frame according to the linked list where the target index item is located; wherein the target associated frame refers to a historical key frame that matches the current key frame in image characteristics.

For example, if the euclidean distance between the feature descriptor of the feature block of a certain historical key frame and the feature descriptor of the feature block of the current key frame is less than or equal to the distance threshold, it is determined that the historical key frame matches the current key frame.

It can be understood that the result of the feature information of the feature block mapped by the hash function is the same for the two matched frames, which is the hash collision problem. The processing method for solving the collision problem can be a chain address method, namely, a linked list is used for connecting the collided data, so the linked list is also called as a collision data linked list. In the embodiment of the application, the characteristics of the linked list are skillfully utilized to realize the acquisition of a plurality of target associated frames of the current key frame; therefore, by utilizing the advantages of the Hash index table lookup method and the characteristics of the linked list, a plurality of target associated frames are quickly obtained, and the timeliness of the scene model displaying the local space in the data acquisition stage is further enhanced.

In some embodiments, when the target hash index item is queried, the electronic device may update the linked list where the target hash index item is located according to the query hash value, and store the current key frame, the corresponding camera pose, the acquisition time, and the corresponding depth image in the storage area corresponding to the hash query value.

Step 207, constructing a scene model corresponding to a local space according to the current key frame and the plurality of target associated frames;

step 208, displaying a scene model of the local space to feed back the data acquisition quality of the camera in the local space to a user;

step 209, controlling the camera to perform data acquisition on a specific region of the local space again so as to improve the data acquisition quality in the local space, and displaying the reconstructed scene model of the local space; or controlling the camera to acquire data of the next local space; after the electronic device completes the data acquisition of each local space in the global space, step 210 is entered.

It can be understood that, whether the feature region is newly acquired or the next local space is acquired, the electronic device actually needs to return to perform step 201 to construct a new scene model corresponding to the local space based on the newly acquired key frame and the target associated frame matching the key frame, so as to feed back the current data acquisition quality to the user again.

Step 210, obtaining a feature point matching relationship chain between target key frames indicated by each linked list in a hash index table constructed currently;

it is understood that, here, the currently constructed hash index table refers to the hash index table at the time of completing data collection of the global space, the hash index table at this time already records the hash index entries of the key frames in the global space, which is different from the currently constructed hash index table described in step 204, and the hash index table in step 204 refers to the table that has been constructed by the electronic device until the time of executing step 204.

When the data collection of the global space is completed, each linked list in the hash index table at the time records the index entry of the key frame with spatial correlation in the global space. That is, according to each hash index entry in a certain linked list, all key frames with spatial correlation indicated by the hash index entry can be obtained, and based on this, the feature point matching pair relationship chain of the frames can be determined. For example, as shown in fig. 3, the key frames indicated by a certain linked list include a frame 301, a frame 302, and a frame 303, based on which a feature point matching pair set of the frame 301 and the frame 302 and a feature point matching pair set of the frame 302 and the frame 303 are easily obtained, that is, a feature point matching pair relation chain 304 within a white dashed box in the figure is established, and it should be noted that fig. 3 only illustrates a part of the feature point matching pairs.

And step 211, optimizing the camera pose of each target key frame pointed in the corresponding linked list according to each feature point matching pair linked list, so as to obtain the target camera pose of each target key frame in the global space.

In fact, the camera poses of the key frames in the same linked list are subjected to global joint optimization, when the method is realized, a re-projection error function taking the camera pose of each target key frame as an independent variable can be constructed according to the feature point matching pair linked list, and iterative optimization processing is carried out on the function, so that the re-projection error meets a specific condition, more accurate camera poses are obtained, and the quality of a reconstruction model is improved.

Step 212, generating a point cloud model corresponding to the target key frame according to the target camera pose of each target key frame and the corresponding depth image;

step 213, fusing the point cloud model of the target key frame into the currently constructed geometric model of the global space to perfect the geometric model of the global space until the point cloud model of each target key frame is fused into the currently constructed geometric model of the global space, thereby obtaining the target geometric model of the global space.

The geometric model may be a voxel model. In some embodiments, the electronic device may project a currently constructed voxel model of the global space by a ray tracing projection method to a point cloud model consistent with a camera pose of a target key frame to be fused, so as to obtain a projected point cloud model; then, registering the Point cloud model of the target key frame to be fused with the Point cloud model of the projection through an Iterative Closest Point (ICP) algorithm, so as to obtain a better camera pose of the frame; generating a more advantageous cloud model of the frame according to the more excellent camera pose and the depth image corresponding to the frame, further performing block voxel reconstruction on the model, and fusing the model to a voxel model of a global space; thus, the quality of the target geometric model in the global space can be improved.

In some embodiments, for step 206, a plurality of target associated frames of the current key frame are obtained according to the linked list where the target index item is located; wherein the target associated frame refers to a historical key frame matching the current key frame in image characteristics, as shown in fig. 2B, the electronic device may be implemented by the following steps 2061 to 2064:

step 2061, the camera pose of the candidate keyframe pointed by each index in the linked list is obtained.

It can be understood that, in the embodiment of the present application, when the user sees the currently displayed scene model in the local space, the user may control the electronic device to repeatedly scan some areas due to unsatisfactory data acquisition quality of the areas, which may result in many repeated key frames being recorded in the linked list. Therefore, the electronic device needs to screen out the target associated frame from the candidate key frames pointed by the linked list, so as to reconstruct the scene model of the local space faster through fewer key frames without losing the quality of the model.

Step 2062, determining the camera pose of the current key frame according to the feature point matching relationship between the current key frame and any candidate key frame;

step 2063, determining the difference between the camera pose of the current keyframe and the camera pose of each of the candidate keyframes, respectively.

In implementation, the electronic device may determine, according to the camera pose of the current key frame and the camera pose of the candidate key frame, a displacement of the camera when acquiring the current key frame relative to when acquiring the candidate key frame, and determine the displacement as the difference.

Step 2064, determining the candidate key frame corresponding to the difference value greater than the specific threshold as the target associated frame.

It is understood that the keyframes used in reconstructing the scene model of the local space may be reduced by step 2064, thereby increasing the speed of model reconstruction. In some embodiments, the electronic device may delete the candidate key frame corresponding to the difference value less than or equal to the specific threshold and the hash index entry in the linked list to update the linked list. In addition, the electronic equipment can delete the key frames pointed by the deleted hash index items and the related data thereof, so that the number of the key frames used in the reconstruction of the target geometric model of the global space is reduced, and the three-dimensional reconstruction speed is further improved.

An embodiment of the present application further provides a data acquisition method, and fig. 4A is a schematic flow chart illustrating an implementation of the data acquisition method according to the embodiment of the present application, as shown in fig. 4A, the method may include the following steps 401 to 408:

step 401, collecting a current key frame in a global space through a camera;

step 402, acquiring a plurality of target associated frames matched with the current key frame on the image characteristics;

as can be seen from the above-described embodiments, the electronic device may extract feature information of a feature block in a current key frame, and process the feature information by using a hash function to obtain a query hash value; and inquiring a target index item matched with the inquired hash value from the currently constructed hash index table, thereby acquiring a plurality of target associated frames according to a linked list where the target index item is located.

Step 403, establishing a feature point matching pair relation chain between the current key frame and the plurality of target associated frames according to the feature descriptors of the feature blocks of the current key frame and the feature descriptors of the feature blocks of each target associated frame.

It is understood that the multiple target-associated frames also have spatial correlation therebetween, i.e., are matched in image characteristics. Therefore, a feature point matching pair relation chain between the current key frame and a plurality of target associated frames is easily established, and any two frames have spatial correlation. For example, as shown in fig. 3, the current key frame 301 has target associated

frames

302 and 303, and therefore, a set of feature point matching pairs of the frame 301 and the frame 302 may be determined, and a set of feature point matching pairs between the frame 302 and the frame 303 may also be determined, that is, a feature point matching pair relation chain 304 within a white dashed box in the drawing is established.

And step 404, constructing a re-projection error function with the camera pose of the current key frame and each key frame in the plurality of target associated frames as an independent variable according to the feature point matching pair relation chain.

Understandably, by iteratively optimizing the reprojection error function, further optimization of the camera pose for each keyframe can be achieved due to the mutual constraints on the camera pose between frames.

In some embodiments, as shown in fig. 4B, the electronic device may implement step 404 by steps 4041 through 4044 as follows.

And 405, performing iterative optimization processing on the re-projection error function to enable the re-projection error to meet a specific condition, so as to obtain the optimized camera pose of each key frame.

It should be noted that the specific condition may be various, for example, the specific condition is that the reprojection error is less than or equal to the error threshold, or the specific condition is that the number of iterations reaches a preset number, and it is determined that the reprojection error satisfies the specific condition.

And 406, constructing a point cloud model corresponding to a local space according to the optimized camera pose of each key frame and the corresponding depth image to serve as the scene model.

In some embodiments, after obtaining the optimized camera pose of each key frame, the electronic device further needs to update the optimized camera pose of the corresponding key frame to the storage area pointed by the hash index entry, for example, replace the camera pose of the key frame stored in the storage area with the optimized camera pose obtained in step 405.

Step 407, voxel filtering is carried out on the point cloud model;

step 408, rendering and displaying the voxel-filtered point cloud model to feed back the data acquisition quality of the camera in the local space to a user, and then returning to execute step 401 until the data acquisition work of the global space is finished, so that the following results can be obtained: the storage areas pointed by the hash index items comprise the key frames, the depth images corresponding to the key frames, the acquisition time, the camera pose obtained by the last optimization, the feature point matching relation with other key frames and the like.

It will be appreciated that the electronic device returns to perform step 401 to continue with the acquisition of key frames. Until the data acquisition work is finished, as shown in fig. 4C, the electronic device may implement three-dimensional reconstruction of the global space through the following steps 409 to 415:

step 409, acquiring a feature point matching pair relation chain between target key frames indicated by each linked list in the currently constructed hash index table;

and step 410, optimizing the camera pose of each target key frame pointed in the corresponding linked list according to each feature point matching pair linked list, so as to obtain the target camera pose of each target key frame in the global space.

Here, the electronic device may also implement joint optimization of the camera pose of each target key frame in the same linked list by establishing a reprojection error function. The optimization method is similar to steps 403 to 405, except that at this time, the joint optimization is the camera poses of all keyframes with spatial correlation in the global space. During iterative optimization, the initial value of the camera pose is the last updated camera pose stored in the storage area pointed by the hash index item.

Step 411, generating a point cloud model corresponding to the target key frame according to the target camera pose of each target key frame and the corresponding depth image;

step 412, projecting the currently constructed voxel model of the global space into a point cloud model consistent with the camera pose of the target key frame to be fused by a ray tracing projection method;

step 413, registering the point cloud model of the target key frame to be fused with the projected point cloud model through an ICP (inductively coupled plasma) algorithm, so as to obtain a better camera pose of the target key frame to be fused;

step 414, generating a better point cloud model according to the better camera pose and the depth image corresponding to the target key frame to be fused;

and 415, performing block voxel reconstruction on the better point cloud model, fusing a model obtained after the block voxel reconstruction to a voxel model of the global space, and then returning to the step 412 to fuse the point cloud model of the next key frame to the currently constructed voxel model of the global space until the point cloud model of each target key frame is fused to the voxel model of the global space, so as to obtain the target voxel model of the global space.

In some embodiments, the electronic device further performs a texture synthesis process on the voxel model during the model building process to improve the visualization effect of the model.

In some embodiments, for the above step 404, constructing a reprojection error function with the camera pose of the current keyframe and each keyframe of the plurality of target-related frames as an argument according to the feature point matching pair relationship chain, as shown in fig. 4B, the electronic device may be implemented by the following steps 4041 to 4044:

step 4041, according to the feature point matching pair relation chain, obtaining a feature point matching pair set between the current key frame and each two key frames in the multiple target relation frames.

For example, still taking the key frame shown in fig. 3 as an example, the plurality of target associated frames includes

key frames

302 and 303; every two key frames, which may be current key frame 301 and key frame 302, key frame 302 and key frame 303, key frame 303 and current key frame 301.

Step 4042, constructing a corresponding reprojection error sub-function by using the camera pose of one of the two corresponding key frames as an independent variable according to each feature point matching pair set.

Still taking fig. 3 as an example, the corresponding reprojection error sub-function is determined, for example, from the feature point matching pair sets of the current key frame 301 and the key frame 302. Suppose that the spatial three-dimensional coordinate of the kth feature point of the current key frame 301 is P_k＝[X_k,Y_k,Z_k]^TIts projected coordinate in the key frame 302 is U_k＝[u_k,v_k]^TThe relationship between the two is shown in the following formula (1):

in the formula, s_kDepth information referring to the K-th feature point, K representing the current key frame 301 pairThe corresponding projection matrix, exp (ξ), represents the camera pose of the current keyframe 301.

The projection coordinates U can be derived from equation (1)_kThe calculation formula of (2) is as follows:

based on the above, a reprojection error subfunction ζ is constructed according to the reprojection error of each feature point of the current key frame 301 in the feature point matching pair set^*The formula (2) is shown in the following formula (3):

in the formula (I), the compound is shown in the specification,

representing the reprojection error, μ, of the k-th feature point_kRepresenting the pixel coordinates of the target point in the key frame 302 that matches the kth feature point.

Step 4043, determining the weight of the corresponding reprojection error subfunction according to the acquisition time of each two key frames.

In some embodiments, the corresponding weight may be determined according to the acquisition time of the keyframe of which the pose is to be optimized in every two of the keyframes. Understandably, during data acquisition, the camera pose of the acquired keyframes is constantly being optimized. Therefore, the earlier the keyframe is captured, the more accurate the current stored camera pose is, and the higher the confidence.

In the embodiment of the application, the weight of the reprojection error subfunction is set according to the acquisition time of the key frame, so that the camera pose of the key frame with high confidence coefficient can be used for restraining (i.e. correcting) the camera pose of the key frame with low confidence coefficient, and a more accurate optimization result can be obtained.

Step 4044, constructing the reprojection error function according to each of the weights and the corresponding reprojection error subfunction.

Still taking fig. 3 as an example, the constructed reprojection error function is shown in the following equation (4):

ζ^*(ξ₀,ξ₁,ξ₂)＝w₀ζ^* ₀(ξ₀)+w₁ζ^* ₁(ξ₁)+w₂ζ^* ₂(ξ₂) (4)；

in the formula, ξ₀Represents the camera pose, ζ, of the current keyframe 301^* ₀(ξ₀) Is shown at ξ₀Current key frame 301 as an argument and the reprojection error subfunction of key frame 302, ξ₁The camera pose, ζ, representing the keyframe 302^* ₁(ξ₁) Is shown at ξ₁The key frame 302 of the independent variable and the reprojection error subfunction corresponding to the key frame 303 ξ₂The camera pose, ζ, representing the keyframe 303^* ₂(ξ₂) Is shown at ξ₂The reprojection error subfunction, w, of the key frame 303, which is an argument, corresponding to the current key frame 301₀、w₁、w₂Representing the weight.

An embodiment of the present application further provides a data acquisition method, fig. 5 is a schematic flow chart illustrating an implementation of the data acquisition method according to the embodiment of the present application, and as shown in fig. 5, the method may include the following steps 501 to 505:

step 501, constructing a scene model corresponding to a local space according to a current key frame acquired by a camera and a plurality of target associated frames of the current key frame; the target associated frame is a key frame matched with the current key frame on the image characteristics;

step 502, displaying a scene model of the local space to feed back the data acquisition quality of the camera in the local space to a user;

step 503, receiving a scanning instruction, where the scanning instruction is used to instruct the camera to rescan a specific area in the local space or scan a next local space.

In an application scenario of automatic acquisition, for example, the electronic device is a robot with a shooting function. The user can clearly determine the quality of the data acquisition quality of the corresponding local space through the displayed scene model. If some areas acquire poor image quality or some areas are missed, the user can input a scanning instruction to guide the electronic equipment to scan the area indicated in the scanning instruction. In some embodiments, the scan instruction may include position information of the feature region, a photographing instruction, and the like. The rescan may be an area that is not scanned or an area with poor scan data acquisition.

Step 504, in the case that the scanning instruction instructs to rescan the specific area, according to the scanning instruction, controlling the camera to rescan the specific area so as to improve the data acquisition quality in the local space and display the reconstructed scene model of the local space;

and 505, controlling the camera to scan the next local space according to the scanning instruction when the scanning instruction instructs to scan the next local space.

It should be noted that, no matter whether data acquisition is performed on a specific area again or data acquisition is performed on the next local space, the electronic device actually needs to return to perform step 501 to construct a new scene model corresponding to the local space based on the newly acquired key frame and the target associated frame matched with the newly acquired key frame, so as to feed back the current data acquisition quality to the user again. And performing off-line three-dimensional reconstruction based on the acquired target key frame until the data acquisition work of the global space is finished to obtain a target geometric model of the global space.

Three-dimensional reconstruction techniques based on multi-frame images, such as Structure From Motion (SFM), multi-View Geometry (MVG), etc., require powerful computing power. The geometric reconstruction of a large-scale scene often needs to be calculated on a high-performance computing platform for hours, so that the geometric reconstruction of the scene cannot be directly and quickly calculated and generated on portable equipment such as a mobile phone, an augmented reality helmet and the like. Moreover, the data acquisition process of the three-dimensional reconstruction technology based on the multi-frame images can affect the quality of the final reconstruction model, and the quality and the integrity of the generated model cannot be fed back in time when the technology is operated and used, so that the technical operation needs professionals to judge the acquisition quality of the image data through abundant experience, and the dependency of the model quality of the three-dimensional reconstruction method on the professionals is very strong, thereby seriously affecting the large-scale application and popularization of the three-dimensional reconstruction technology.

With the use of Depth cameras mounted on portable devices, fast three-dimensional reconstruction techniques based on red-green-Blue (RGB-D) and Depth image (RGB-D) data sources are becoming popular. Compared with the three-dimensional reconstruction technology based on the RGB image, the technology can quickly generate a three-dimensional model of a scene and obtain interactive feedback information, so that the success rate and the usability of the three-dimensional reconstruction process are ensured. However, since the depth camera is limited by factors such as size and power consumption when mounted on a portable device, and both resolution and effective distance of the depth camera are limited, the RGB-D based three-dimensional reconstruction technology is mainly applied to indoor scenes.

A three-dimensional reconstruction technology based on RGB-D mainly tracks the position and posture of a visual angle through a method of geometrically registering a current key frame and generating a model, and fuses multi-frame RGB-D data to generate a three-dimensional model by using a method of truncating a symbolic Distance Field (TSDF) space directed voxel. The technology can realize real-time interactive three-dimensional model reconstruction, and can feed back the generated three-dimensional model in real time while scanning reconstruction.

The RGB-D based three-dimensional reconstruction technique requires a high performance Graphics Processing Unit (GPU) computing platform and occupies a large amount of video memory and memory resources. The related technology solves the problem of invalid occupation of the GPU video memory, so that the reconstruction range is not limited by hardware, but the method still cannot achieve interactive application on a portable equipment platform with limited computing capacity. Some attempts to use RGB-D based three-dimensional reconstruction methods on portable device platforms tend to sacrifice accuracy, quality and range of the reconstructed model. The limitation of computing power seriously hinders the market popularization of the scanning type three-dimensional reconstruction technology at an application end, and how to achieve the dual requirements of real-time interactive feedback and reconstruction model quality under the condition of limited computing resources is a problem to be mainly solved by the embodiment of the application.

Based on this, an exemplary application of the embodiment of the present application in a practical application scenario will be described below.

In the embodiment of the application, the RGB-D based three-dimensional reconstruction technology used for the high-performance computing platform is subjected to functional decomposition. And analyzing which ones are used for interactive feedback and which ones are beneficial to reconstruction quality in the reconstruction process. And selecting an efficient algorithm function to quickly obtain a rendering display result, so that the real-time interaction capability can be realized on a low-performance computing platform. Meanwhile, the reliable and robust optimization technology is selected to ensure the generation quality of the three-dimensional model in the off-line process. The online process and the offline process are matched with each other to achieve effective balance among computing resources, computing speed and computing quality.

In the embodiment of the application, the final purpose is to quickly realize the three-dimensional reconstruction calculation of an indoor scene on a portable device with limited calculation capacity, ensure the quality of a three-dimensional reconstruction model, support the real-time interactive feedback capacity and guide a user to operate a scanned area. The computing resource, the computing speed and the computing quality of the RGB-D-based three-dimensional reconstruction technology can meet the available requirements, and therefore the RGB-D-based three-dimensional reconstruction technology can be landed on portable equipment in engineering.

In the embodiment of the application, the main idea is to split the RGB-D based three-dimensional reconstruction technology that needs to be implemented on a high-performance computing platform into a real-time reconstruction module for interactive feedback and an offline reconstruction module for ensuring the quality of a reconstruction model.

The real-time reconstruction module is not really used for reconstructing the three-dimensional model, and the main purpose of the real-time reconstruction module is to timely feed back the quality of image data used for reconstructing the model for a user so as to guide the user to continue scanning other areas or repeatedly scan the areas to complement the image data with poor quality before. The module requires that the model used for rendering display can represent the quality of the model generated finally, and also requires that the model generation be timely enough to achieve the purpose of real-time interactive application.

The off-line reconstruction module is mainly used for ensuring the quality of a reconstruction model as much as possible under the condition of limited computing resources without requiring the real-time property of reconstruction, and has lower requirement on the computing speed compared with the real-time reconstruction module. Due to the fact that the fast three-dimensional reconstruction technology based on the RGB-D is still adopted, the speed of off-line reconstruction can still be controlled within seconds. It simply does not require feedback, allowing the technology to be engineered on a portable device.

In the real-time reconstruction process, as shown in fig. 6, the following steps 601 to 607 may be included:

step 601, extracting feature descriptors of feature blocks of a current key frame;

step 602, establishing a dictionary index hash table according to the feature descriptors;

step 603, according to the dictionary index hash table, searching neighboring key frames having spatial correlation with the current key frame, that is, the target associated frame in the above embodiment; thus, the camera pose of the current key frame is determined according to the spatial correlation between the current key frame and the adjacent key frames;

step 604, screening repeated history frames;

step 605, determining a feature point relationship chain, that is, the feature point matching relationship chain in the foregoing embodiment, according to the filtered neighboring keyframes and the current keyframe;

step 606, locally minimizing a reprojection error according to the feature point relationship chain, so as to obtain an optimized camera pose of each key frame;

and 607, rendering and displaying each corresponding key frame according to the optimized camera pose.

For example, a space point cloud model generated by displaying 10 frames of latest adjacent key frames is rendered for feeding back the data quality of the latest scanning area in real time. The purpose of doing so not only can realize real-time interaction ability on the platform that computing resources are limited, also make the frame data and the position appearance data of gathering get the high-efficient update simultaneously. The difficulty of this technique is how to obtain the neighboring keyframes quickly, and since the scanning path of the portable device is irregular and unpredictable, it is very easy to scan the same area repeatedly, so that the timing of the neighboring keyframes becomes complicated and varied.

The method comprises the steps of firstly extracting feature descriptors, establishing a dictionary index hash table according to the feature descriptors, ensuring that visual repositioning can be quickly realized when a user scans a repeated region, calling key frame data stored in the region scanned before, and obtaining a feature point pair relation between adjacent key frames, wherein the adjacent key frame data in a spatial relation are provided, feature point matching pair relation chains of feature points in each key frame can be established according to each feature descriptor, and minimum reprojection errors (L local Bundle Adjustment, L BA) of a local space can be calculated by using the relation chains, so that more accurate pose data can be obtained, and the pose data are used for establishing a local space model for rendering and displaying so as to feed back the data acquisition quality of a user in the current space.

The user scans a certain area repeatedly, so that the model quality before updating is always the result, timestamp linked list management is added into key frame data, and when the situation that the current key frame is close to the pose of a certain historical key frame and a large enough repeated area exists, the user intentionally updates the scanning area, so that the historical key frame data of the repeated area are deleted, the purpose of interactive application is achieved, and the calculation burden of an offline reconstruction step is reduced.

In the off-line reconstruction process, as shown in fig. 7, the following steps 701 to 704 may be included:

701, performing pose correlation optimization;

step 702, iterating the closest point registration;

step 703, reconstructing block voxels;

step 704, ray tracing projection;

in the off-line reconstruction process, a block directed voxel reconstruction method (Chunked TSDF) is used to generate the model quickly. In order to ensure that the pose participating in the off-line three-dimensional reconstruction is accurate enough, the pose needs to be further optimized, and two classical optimization strategies are adopted. The first one, step 701, is to find a global large loop link through the association relationship of the keyframes determined in the real-time reconstruction process, so as to perform Pose relationship optimization (PGO). And secondly, in steps 702 to 704, establishing an iterative closest point registration optimization of a model to Frame (ModelTo Frame) with the keyframe to be reconstructed by using the reconstructed spatial voxel ray tracing projection. The double insurance method is used for an off-line three-dimensional reconstruction process, and ensures the generation quality of a three-dimensional reconstruction model.

The voxel reconstruction method is a rapid three-dimensional reconstruction technology, and the algorithm can be calculated within a few seconds on a portable equipment platform, so that the balance among calculation resources, calculation speed and calculation quality is achieved, and the rapid three-dimensional reconstruction technology can be landed on the portable equipment in an engineering way.

In the embodiment of the application, a real-time reconstruction process for displaying feedback and an off-line reconstruction process for ensuring reconstruction quality are provided, and balance among computing resources, computing speed and computing quality is achieved, so that the RGB-D data-based three-dimensional reconstruction technology can be engineered and landed on a portable equipment platform with limited computing resources. On the premise of the same reconstruction model quality, the reconstruction speed of the method can be increased from less than 5 frames per second (fps) to more than 30fps, so that the performance requirement of real-time interaction is met.

It should be noted that both the directional voxel reconstruction method (TSDF) and the iterative nearest neighbor registration optimization algorithm in the offline reconstruction module are very suitable for parallel accelerated computation using the GPU. Because the offline reconstruction process does not require interactive rendering of the display, the GPU computing device has additional computational power available for parallel accelerated computing.

In some embodiments, texture synthesis functionality may be added to the offline reconstruction module. The texture synthesis function of a plurality of key frames needs to calculate the view angle contribution of each texture block, which often needs to occupy a large amount of computing resources, and the step-by-step method of the reconstruction process enables the texture synthesis technology to be integrated into the RGB-D rapid three-dimensional reconstruction process, so that the visualization effect of the finally generated model is improved by using the texture.

In addition, with the rise of the fifth generation mobile communication technology (5G) and the edge computing technology, the off-line reconstruction process can be migrated to the edge device with a high-performance computing platform for computing in the future, so as to reduce the computing pressure and power consumption output of the portable device. And the method can also support crowdsourcing of the scanning and collecting data of a plurality of portable devices for three-dimensional reconstruction, thereby enriching the application possibility of the technology.

Based on the foregoing embodiments, the present application provides a data acquisition apparatus, which includes modules and units included in the modules, and can be implemented by a processor in an electronic device; of course, the implementation can also be realized through a specific logic circuit; in implementation, the processor may be a Central Processing Unit (CPU), a Microprocessor (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like.

Fig. 8A is a schematic structural diagram of a data acquisition apparatus according to an embodiment of the present application, and as shown in fig. 8A, the apparatus 800 includes a local model building module 801, a display module 802, and a data acquisition module 803, where:

a local model building module 801, configured to build a scene model corresponding to a local space according to a current key frame acquired by a camera and a plurality of target associated frames of the current key frame; the target associated frame is a key frame matched with the current key frame on the image characteristics;

a display module 802, configured to display a scene model of the local space to feedback to a user a data acquisition quality of the camera in the local space;

a data acquisition module 803, configured to control the camera to perform data acquisition on a specific region of the local space again, so as to improve the quality of data acquisition in the local space, and display the reconstructed scene model of the local space through the display module 802; or controlling the camera to acquire data of the next local space.

In some embodiments, as shown in fig. 8B, the data acquisition apparatus further includes a feature extraction module 804, a hash processing module 805, a query module 806, and a first obtaining module 807; the feature extraction module 804 is configured to extract feature information of a feature block in the current key frame; a hash processing module 805, configured to process the feature information by using a hash function to obtain a query hash value; a query module 806, configured to query, from a currently constructed hash index table, a target index entry matching the query hash value; a first obtaining module 807, configured to obtain the multiple target associated frames according to a linked list where the target index items are located when the target index items are queried.

In some embodiments, a first obtaining module 807 is configured to: acquiring a camera pose of a candidate key frame pointed by each index item in the linked list; determining the camera pose of the current key frame according to the feature point matching relation between the current key frame and any candidate key frame; determining a difference between the camera pose of the current keyframe and the camera pose of each of the candidate keyframes; and determining the candidate key frame corresponding to the difference value larger than a specific threshold value as the target associated frame.

In some embodiments, as shown in fig. 8B, the data collection device 800 further comprises an update module 808 for: and deleting the candidate key frames corresponding to the difference value smaller than or equal to the specific threshold and the hash index items in the linked list so as to update the linked list.

In some embodiments, the update module 808 is further configured to: under the condition that the target hash value is inquired, updating the linked list according to the inquiry hash value of the current key frame; and under the condition that the target hash value is not inquired, generating a new linked list according to the inquiry hash value of the current key frame so as to update the hash index table.

In some embodiments, a local model building module 801 is configured to: establishing a feature point matching pair relation chain between the current key frame and the plurality of target association frames according to the feature descriptors of the feature blocks of the current key frame and the feature descriptors of the feature blocks of each target association frame; constructing a reprojection error function taking the camera pose of the current key frame and each key frame in the plurality of target associated frames as an independent variable according to the feature point matching pair relation chain; performing iterative optimization processing on the re-projection error function to enable the re-projection error to meet specific conditions, so as to obtain the optimized camera pose of each key frame; and constructing a point cloud model corresponding to a local space according to the optimized camera pose of each key frame and the corresponding depth image to serve as the scene model.

In some embodiments, a local model building module 801 is configured to: acquiring a feature point matching pair set between the current key frame and each two key frames in the target associated frames according to the feature point matching pair relation chain; constructing a corresponding re-projection error sub-function by taking the camera pose of one key frame of the two corresponding key frames as an independent variable according to each feature point matching pair set; determining the weight of the corresponding re-projection error subfunction according to the acquisition time of every two key frames; and constructing the reprojection error function according to each weight and the corresponding reprojection error sub-function.

In some embodiments, a display module 802 to: performing voxel filtering on the point cloud model; and rendering and displaying the voxel filtered point cloud model.

In some embodiments, the data acquisition module 803 is configured to: receiving a scan instruction for instructing the camera to rescan a specific region in the local space or scan a next local space; controlling the camera to rescan the specific area according to the scanning instruction if the scanning instruction instructs to rescan the specific area.

In some embodiments, as shown in fig. 8C, the data acquisition apparatus 800 further comprises a second acquisition module 809, an optimization module 810, a point cloud generation module 811, and a fusion module 812; the second obtaining module 809 is configured to obtain, after data acquisition of each local space is completed, a feature point matching relationship chain between target key frames indicated by each linked list in the hash index table currently constructed; an optimizing module 810, configured to optimize, according to each feature point matching pair relation chain, a camera pose of each target key frame pointed in the corresponding chain table, so as to obtain a target camera pose of each target key frame in a global space; the point cloud generating module 811 is used for generating a point cloud model corresponding to the target key frame according to the target camera pose of each target key frame and the corresponding depth image; a fusion module 812, configured to fuse the point cloud model of the target key frame into a currently-constructed geometric model of the global space to perfect the geometric model of the global space until the point cloud model of each target key frame is fused into the currently-constructed geometric model of the global space, so as to obtain the target geometric model of the global space.

The above description of the apparatus embodiments, similar to the above description of the method embodiments, has similar beneficial effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

It should be noted that, in the embodiment of the present application, if the data acquisition method is implemented in the form of a software functional module and is sold or used as a standalone product, the data acquisition method may also be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for enabling an electronic device (which may be a mobile phone, a tablet computer, an e-reader, a notebook computer, a desktop computer, a robot, a drone, a server, an augmented reality helmet, etc.) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.

Correspondingly, an embodiment of the present application provides an electronic device, fig. 9 is a schematic diagram of a hardware entity of the electronic device according to the embodiment of the present application, and as shown in fig. 9, the hardware entity of the electronic device 900 includes: comprising a memory 901 and a processor 902, said memory 901 storing a computer program operable on the processor 902, said processor 902 implementing the steps in the data acquisition method provided in the above embodiments when executing said program.

The memory 901 is configured to store instructions and applications executable by the processor 902, and may also buffer data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or already processed by the processor 902 and modules in the electronic device 900, and may be implemented by a flash memory (F L ASH) or a Random Access Memory (RAM).

Correspondingly, the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps in the data acquisition method provided in the above embodiments.

Here, it should be noted that: the above description of the storage medium and device embodiments is similar to the description of the method embodiments above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the storage medium and apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application. The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.

Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for enabling an electronic device (which may be a computer, a tablet computer, an e-reader, a notebook computer, a desktop computer, a robot, a drone, a server, an augmented reality helmet, etc.) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.

The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments.

Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict.

The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments.

The above description is only for the embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of data acquisition, the method comprising:

constructing a scene model corresponding to a local space according to a current key frame acquired by a camera and a plurality of target associated frames of the current key frame; the target associated frame is a key frame matched with the current key frame on the image characteristics;

displaying a scene model of the local space to feedback to a user a data acquisition quality of the camera in the local space;

controlling the camera to perform data acquisition on a specific area of the local space again so as to improve the data acquisition quality in the local space and display a reconstructed scene model of the local space; or controlling the camera to acquire data of the next local space.

2. The method of claim 1, further comprising:

extracting feature information of a feature block in the current key frame;

processing the characteristic information by utilizing a hash function to obtain a query hash value;

inquiring a target index item matched with the inquiry hash value from the currently constructed hash index table;

and under the condition that the target index item is inquired, acquiring the plurality of target associated frames according to the linked list where the target index item is located.

3. The method according to claim 2, wherein the obtaining the plurality of target associated frames according to the linked list of the target hash values comprises:

acquiring a camera pose of a candidate key frame pointed by each index item in the linked list;

determining the camera pose of the current key frame according to the feature point matching relation between the current key frame and any candidate key frame;

determining a difference between the camera pose of the current keyframe and the camera pose of each of the candidate keyframes;

and determining the candidate key frame corresponding to the difference value larger than a specific threshold value as the target associated frame.

4. The method of claim 3, further comprising:

and deleting the candidate key frames corresponding to the difference value smaller than or equal to the specific threshold and the hash index items in the linked list so as to update the linked list.

5. The method of claim 3, further comprising:

under the condition that the target hash value is inquired, updating the linked list according to the inquiry hash value of the current key frame;

and under the condition that the target hash value is not inquired, generating a new linked list according to the inquiry hash value of the current key frame so as to update the hash index table.

6. The method according to any one of claims 1 to 5, wherein the constructing a scene model corresponding to a local space according to a current key frame acquired by a camera and a plurality of target associated frames of the current key frame comprises:

establishing a feature point matching pair relation chain between the current key frame and the plurality of target association frames according to the feature descriptors of the feature blocks of the current key frame and the feature descriptors of the feature blocks of each target association frame;

constructing a reprojection error function taking the camera pose of the current key frame and each key frame in the plurality of target associated frames as an independent variable according to the feature point matching pair relation chain;

performing iterative optimization processing on the re-projection error function to enable the re-projection error to meet specific conditions, so as to obtain the optimized camera pose of each key frame;

and constructing a point cloud model corresponding to a local space according to the optimized camera pose of each key frame and the corresponding depth image to serve as the scene model.

7. The method of claim 6, wherein constructing a reprojection error function with a camera pose of the current keyframe and each of the plurality of target-related frames as an argument from the chain of feature-point-matching pairs comprises:

acquiring a feature point matching pair set between the current key frame and each two key frames in the target associated frames according to the feature point matching pair relation chain;

constructing a corresponding re-projection error sub-function by taking the camera pose of one key frame of the two corresponding key frames as an independent variable according to each feature point matching pair set;

determining the weight of the corresponding re-projection error subfunction according to the acquisition time of every two key frames;

and constructing the reprojection error function according to each weight and the corresponding reprojection error sub-function.

8. The method of claim 6, wherein displaying the scene model of the local space comprises:

performing voxel filtering on the point cloud model;

and rendering and displaying the voxel filtered point cloud model.

9. The method according to any one of claims 1 to 5, wherein the controlling the camera to re-acquire data of a specific region of the local space comprises:

receiving a scan instruction for instructing the camera to rescan a specific region in the local space or scan a next local space;

controlling the camera to rescan the specific area according to the scanning instruction if the scanning instruction instructs to rescan the specific area.

10. The method according to any one of claims 2 to 5, further comprising:

after data acquisition of each local space is finished, acquiring a feature point matching pair relation chain between target key frames indicated by each linked list in a currently constructed hash index table;

matching a pair of relation chains according to each feature point, and optimizing the camera pose of each target key frame pointed in the corresponding relation chains so as to obtain the target camera pose of each target key frame in the global space;

generating a point cloud model corresponding to the target key frame according to the target camera pose of each target key frame and the corresponding depth image;

and fusing the point cloud models of the target key frames into the geometric model of the currently constructed global space to perfect the geometric model of the global space until the point cloud model of each target key frame is fused into the geometric model of the currently constructed global space, thereby obtaining the target geometric model of the global space.

11. A data acquisition device, comprising:

the local model building module is used for building a scene model corresponding to a local space according to a current key frame acquired by a camera and a plurality of target associated frames of the current key frame; the target associated frame is a key frame matched with the current key frame on the image characteristics;

the display module is used for displaying the scene model of the local space so as to feed back the data acquisition quality of the camera in the local space to a user;

the data acquisition module is used for controlling the camera to perform data acquisition on a specific area of the local space again so as to improve the data acquisition quality in the local space, and displaying the reconstructed scene model of the local space through the display module; or controlling the camera to acquire data of the next local space.

12. An electronic device comprising a memory and a processor, the memory storing a computer program operable on the processor, wherein the processor implements the steps of the data acquisition method of any one of claims 1 to 10 when executing the program.

13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the data acquisition method according to any one of claims 1 to 10.