CN108898661A

CN108898661A - The method, apparatus that 3-D image constructs and the device with store function

Info

Publication number: CN108898661A
Application number: CN201810553584.3A
Authority: CN
Inventors: 欧勇盛; 熊荣; 江国来; 王志扬; 冯伟
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2018-05-31
Filing date: 2018-05-31
Publication date: 2018-11-27
Anticipated expiration: 2038-05-31
Also published as: CN108898661B

Abstract

This application discloses a kind of method of 3-D image building, this method includes：Multiframe depth image is obtained, and every frame depth image is shown；Whether detection user selects the point in the depth image of display, and will test the depth image selected and be determined as key frame images, remaining depth image is determined as normal frames image；The plane in every frame depth image is extracted, and obtains the characteristic information of the plane of extraction；Using the characteristic information of the plane of extraction, at least partly consecutive frame depth image of multiframe depth image is registrated；Fusion registration obtains 3-D image as a result, constructing.Stable extraction dominant plane structure may be implemented by the above method, and then realize the more accurate indoor 3-D image of building.Present invention also provides a kind of device of 3-D image building and with the device of store function.

Description

Three-dimensional image construction method and device with storage function

Technical Field

The present application relates to the field of image construction, and in particular, to a method and an apparatus for constructing a three-dimensional image, and a device having a storage function.

Background

The three-dimensional image construction and reconstruction is a challenging research subject, and mainly relates to theories and technologies in multiple fields of computer vision, computer graphics, pattern recognition, optimization and the like. The traditional method is to adopt ranging sensors such as laser and radar or structured light technology to acquire structural information of the surface of a scene or an object to construct a three-dimensional image, but most of the instruments are expensive and not easy to carry, so that the application occasions are limited.

In recent years, with the development of computer vision technology, more and more researchers start to construct three-dimensional images by using a pure vision method, but in the prior art, all points of a whole image are subjected to matching calculation, the calculation amount is too large, and due to the lack of plane constraint and optimization, error data is easily caused by finally obtained reconstructed data. Due to the complexity of the reconstruction place, for the simple point feature-based matching method, the important features are easy to lose, and accurate data of the three-dimensional image cannot be obtained finally.

Disclosure of Invention

The technical problem mainly solved by the application is to provide a method and equipment for constructing a three-dimensional image and a device with a storage function, which can stably extract a main plane structure, and further realize construction of a more accurate indoor three-dimensional image.

In order to solve the technical problem, the application adopts a technical scheme that: there is provided a method of three-dimensional image construction, the method comprising:

acquiring multiple frames of depth images, and displaying each frame of depth image;

detecting whether a user selects a point in the displayed depth image, determining the detected selected depth image as a key frame image, and determining the rest depth images as common frame images;

extracting a plane in each frame of depth image, and acquiring feature information of the extracted plane;

for the key frame image, extracting a first plane class based on the points selected by the user and acquiring feature information of the first plane class, and extracting a second plane class based on a plurality of points in the key frame image except the first plane class and acquiring feature information of the second plane class;

for the common frame image, extracting a second plane based on a plurality of points in the common frame image, and acquiring the characteristic information of the second plane;

registering at least part of adjacent frame depth images of the multiple frames of depth images by using the extracted characteristic information of the plane;

and fusing the registration result to construct a three-dimensional image.

In order to solve the above technical problem, another technical solution adopted by the present application is to provide an apparatus for three-dimensional image construction, the apparatus including: a processor, and a memory coupled to the processor, the apparatus further comprising a human interface component or a human interaction component; the memory stores program data for executing the method of three-dimensional image construction as described above when the processor runs the program data; the human-computer interface component is used for outputting the depth image to an external display device and inputting information of a point in the depth image selected by a user and generated by an external input device; the human-computer interaction component is used for displaying the depth image and generating information of points in the depth image selected by a user.

In order to solve the above technical problem, the present application adopts a further technical solution of providing a device having a storage function, wherein the device stores program data, and the program data realizes the three-dimensional image constructing method as described above when executed.

According to the scheme, whether the user selects a point in the displayed depth image is detected, the depth image selected by the user is determined as the key frame image, and the rest depth images are determined as the common frame images. And further extracting a first plane and the characteristic information of the first plane based on the points selected by the user, extracting a second plane based on a plurality of points in the common frame and acquiring the characteristic information of the second plane. And registering at least part of adjacent frame depth images in the acquired multi-frame depth images by using the acquired characteristic information of the plane to obtain a registration result, and finally constructing a three-dimensional image based on the registration result. According to the technical scheme, the acquired depth image is displayed, the depth image of the point selected by the user is used as the key frame image, the main plane is extracted stably and accurately, and the more accurate three-dimensional image is constructed.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a method for three-dimensional image construction according to the present application;

FIG. 2 is a schematic flow chart diagram illustrating another embodiment of a method for three-dimensional image construction according to the present application;

FIG. 3 is a schematic flow chart diagram illustrating a method for three-dimensional image construction according to another embodiment of the present application;

FIG. 4 is a schematic flow chart diagram illustrating a method for three-dimensional image construction according to still another embodiment of the present application;

FIG. 5 is a schematic flow chart diagram illustrating a method for three-dimensional image construction according to still another embodiment of the present application;

FIG. 6 is a schematic flow chart diagram illustrating a method for three-dimensional image construction according to still another embodiment of the present application;

FIG. 7 is a schematic flow chart diagram illustrating a method for three-dimensional image construction according to still another embodiment of the present application;

FIG. 8 is a schematic flow chart diagram illustrating an embodiment of an apparatus for three-dimensional image construction according to the present application;

FIG. 9 is a schematic flow chart diagram of another embodiment of the apparatus for three-dimensional image construction according to the present application;

FIG. 10 is a schematic flow chart diagram illustrating an apparatus for three-dimensional image construction according to yet another embodiment of the present application;

FIG. 11 is a schematic flow chart diagram of an apparatus for three-dimensional image construction according to another embodiment of the present application;

fig. 12 is a schematic structural diagram of an embodiment of a device with a storage function according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first", "second" and "third" in this application are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or an implicit indication of the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. For example, a first plane of type may be referred to as a second plane of type, and similarly, a second plane of type may be referred to as a first plane of type, without departing from the scope of the present application. The first plane-like surface and the second plane-like surface are both planar surfaces, but they are not the same plane-like surface. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

First, an image in a technical solution to be described later is described, and the technical solution provided in the present application is described based on a depth image, but it should be noted that, since the depth image is obtained by processing an acquired general image, the present application is described based on a depth image, but the technical solution provided in the present application is not limited to a depth image alone.

Please refer to fig. 1, which is a flow chart illustrating a three-dimensional image constructing method according to an embodiment of the present application. In this embodiment, the method for constructing a three-dimensional image specifically includes the following steps:

s110: acquiring multiple frames of depth images, and displaying each frame of depth image.

Since at least part of the depth images of the adjacent frames need to be registered in the method for constructing a three-dimensional image provided by the present application, the depth image obtained in step S110 is at least two frames. It is understood that the upper limit of the number of acquired depth images is not limited, and is specifically set according to the actual need of three-dimensional image construction and the computing power of the three-dimensional image construction device.

Furthermore, the acquired multi-frame depth image may be acquired by a depth camera in real time, or may be an ordinary image acquired by an ordinary function camera, and the acquired depth image is acquired after processing, and the manner of acquiring the depth image is not limited herein. It is to be understood that the camera described herein may be any device capable of capturing images, and specifically may be a stand-alone capturing device or a capturing component (such as a camera) disposed on the device, and is not limited herein.

In one embodiment, the acquired depth images are sequentially displayed for each frame. For example, when the depth camera acquires the depth image in real time, or when the number of the depth images to be displayed is large, the depth images acquired by the depth camera can be set to be displayed in sequence, and the time interval for displaying the depth images is not limited, so that the depth images can be clearly displayed for a user, and a certain time is left for the user to judge whether a point to be selected is in the displayed depth images.

In another embodiment, the acquired depth images are displayed simultaneously for each frame. As described above, when the acquired image is a depth image obtained by preprocessing or the number of depth images currently required to be displayed is small, it is set to simultaneously display the acquired depth images. Here, the acquired depth images may be displayed simultaneously, and the time interval required for display may be set to be longer, or after the user has finished selecting all the depth images, the user may further confirm that the selection of the point in the depth image has been completed, and the processing content corresponding to the following steps may be performed, specifically, set according to actual needs.

S120: whether a user selects a point in the displayed depth image is detected, the depth image detected to be selected is determined as a key frame image, and the rest depth images are determined as common frame images.

In the application, the key frame image is determined through user selection, and specifically, the depth image of which the point is selected by the user is used as the key frame. Since the key frame image is mainly used for building a building and an indoor three-dimensional image, and a wall surface, a ground surface or a roof forms a main structure plane of the building, whether the key frame image is a depth image for a user to judge that a new plane (including the ground surface, the wall surface, the roof and the like forming a main structure part of the building) appears in the current frame image is judged from the perspective of whether the key frame image is the main structure plane forming the building or not. The common frame image is a depth image of which the user does not select a point, namely the user does not have a new plane in the common frame compared with the previous frame or previous frames, and finally the three-dimensional image of the room, the area or the building is more accurately constructed based on the key frame judged by the user and the common frame. It should be noted that, in the technical solution provided by the present application, a user may select a midpoint of the depth image by mouse clicking, touch screen clicking or other manners, so that a manner in which the user selects a point in the displayed depth image is not limited herein.

Step S110 shows that the depth image obtained in the technical solution provided by the present application includes at least two frames, and the key frame is confirmed by the user through selecting a point in the displayed depth image, so that it can be known that the number of key frame images in the obtained depth image is obtained by the user, and the number of key frame images and common frame images included in the depth image is not limited herein. In an embodiment, the number of acquired depth images is small, but a new plane or a new wall appears in each frame of image, each frame of image in the currently acquired depth images is a key frame image, and conversely, when the user does not select a point in one frame of image, each frame of image in the currently acquired depth images is a depth image. It can be understood that, when the acquired depth images are all key frame images or all common frame images, the technical solution provided by the present application may still be applicable.

Further, in the acquired multiple frames of depth images, since no image before the first frame of image can be referred for comparison, the first frame of image may be a default key frame image, or the first frame of depth image may be determined as the key frame image as long as a part or all of the main structure of the building, such as a wall surface, a roof, a ground, etc., appears in the first frame of image, the determination of all the frame images after the first frame is that the current frame of depth image is compared with the previous frame of depth image, and when a new wall surface, roof, or ground appears in the current frame of depth image after comparison, the frame of depth image is determined as the depth image. It can be understood that in the technical solution provided in the present application, most of the key frame images are selected and judged by the user, so the user can also select a desired or desired portion to be displayed in the three-dimensional image for construction, such as a bar counter, a computer desk, etc.

Further, in an embodiment, in the technical scheme provided by the application, when the acquired depth image is displayed for the user to determine whether the depth image is the key frame image, the user may be further prompted whether the current frame has a newly appeared ground or wall surface based on the detection of the depth image, and the user may be helped to quickly determine whether the current frame image is the key frame image by combining the comparison of the user with the previous and subsequent frames of the depth image.

S130: and extracting a plane in each frame of depth image, and acquiring the characteristic information of the extracted plane.

In the technical solution provided by the present application, the obtained depth image is divided into the key frame image and the common frame image based on whether the user selects a point or not, so step S130 extracts a plane in each frame of depth image, and the feature information of the extracted plane specifically includes the following two aspects:

1) for the key frame image, extracting a first plane based on points selected by a user and acquiring feature information of the first plane, and extracting a second plane based on a plurality of points in the key frame image except the first plane and acquiring feature information of the second plane. Wherein the point selected by the user is a point compared to a new ground or wall area appearing in the previous frame. Specifically, please refer to the following embodiment corresponding to fig. 3 for how to extract the first class plane based on the point selected by the user. The characteristic information of the first plane or the second plane at least comprises: position information of the plane in the camera coordinate system in the depth image in which it is located, the number of points contained in the plane, the distance of the plane to the camera, the center coordinates of the plane, etc.

2) And for the common frame image in the depth image, extracting a second plane based on a plurality of points in the common frame image, and acquiring the characteristic information of the second plane. It should be noted that, the number of points required for extracting the second class plane is not limited, that is, the number of points may be based on the total number of points in the normal frame, or may be a certain proportion of the total number of points in the normal frame, which is set by the user. The characteristic information of the second plane at least comprises: and obtaining the position information of the second plane in the camera coordinate system in the depth image.

The extracted planes comprise a first class plane and a second class plane, and when the planes contained in the depth image are extracted, the feature information of the extracted planes is further acquired. The characteristic information of the plane at least includes: position information of the plane in the camera coordinate system in the depth image in which it is located. Of course, in other embodiments, the feature information of the plane further includes other information: such as a label added by the user, etc., see below.

S140: and utilizing the extracted characteristic information of the plane to register at least part of the depth images of the adjacent frames of the multi-frame depth images.

The registration of at least part of adjacent frame depth images of multiple frame depth images means that a plane pair with small difference of extracted features is found in two adjacent frame images respectively by adopting a plane feature comparison mode based on the adjacent frame depth images and plane features extracted from the inside of the adjacent frame depth images, and then a rotation matrix and a translation vector between the two frame images are calculated by adopting a plane registration method. It should be noted that in the technical solution provided by the present application, when a rotation matrix and a translation vector between two adjacent frames are obtained, the rotation matrix and the translation vector are further stored, and finally, based on the obtained rotation matrix and translation vector between two adjacent frames of images, a camera pose in a unified world coordinate system is obtained, so as to obtain a camera pose sequence when acquiring multiple frames of depth images.

In an embodiment, the unified world coordinate system in the present application refers to that a camera position of a first frame of image in a plurality of frames of images is taken as an origin of coordinates, a normal direction of a manually calibrated key plane in the first frame of image is taken as a main axis of the coordinate system, or a main axis direction of the coordinate system may be defined by a user. Whether the direction of the main axis of the coordinate system is defined by a user or the direction of the normal line based on the key plane, the direction of the wall surface or the ground surface is ensured to be parallel to the main axis of the coordinate system. In another embodiment, a camera pose corresponding to a certain frame of image calibrated by a user is used as a coordinate origin, positive and negative coordinates are established in the front and rear frame directions of the position corresponding to the frame of image, and it is to be noted that the positive and negative coordinates in the current coordinate system are only used for representing the direction and do not represent the size.

S150: and fusing the registration result to construct a three-dimensional image.

Since the depth images acquired by the present application are multiple frames, after the registration in step S140, the registration results of the depth images of multiple adjacent frames are obtained, and the obtained registration results are further fused to obtain the three-dimensional image to be constructed. The essence of the fusion of the results of the registration is to perform "splicing" on the extracted same plane in the multi-frame depth images according to the feature information of the plane to form a complete plane. For example, the obtained 5 th frame to the 15 th frame all include a wall, the result of the fusion registration is that position information of planes corresponding to the wall in the 5 th frame to the 15 th frame in the world coordinate system is sequentially obtained based on a rotation matrix and a translation vector between adjacent frame images obtained by the registration, and then the planes corresponding to the wall in the 5 th frame to the 15 th frame are spliced into a relatively complete plane based on the obtained position information and other feature information of the plane corresponding to the wall. Further, it should be noted that, in the current embodiment, only each first class plane in the three-dimensional image acquired by the present application needs to be extracted, so that the three-dimensional diagram of the building or the area can be acquired.

Further, please refer to fig. 2, which is a partial schematic flow chart of another embodiment of the method for constructing a three-dimensional image according to the present application. The embodiment shown in fig. 2 is to perform further optimization processing on the three-dimensional image obtained in step S150 to obtain a more accurate three-dimensional image.

S201: and calling a preset algorithm to optimize the camera pose sequence and the extracted planar characteristic information when acquiring the multi-frame depth images.

In one embodiment, the predetermined algorithm includes at least beam adjustment. When the camera continuously shoots and continuous images of adjacent frames of the plurality of frames of depth images are registered, the extracted planes (including the first-class plane and the second-class plane) and the position information of the points forming the planes in the unified world coordinate system can be obtained. However, since the same first-type plane or second-type plane is continuously detected in a plurality of images and registration errors between two adjacent frames are accumulated, the camera pose sequence and the extracted feature information of the planes when acquiring a plurality of frames of depth images need to be optimized to obtain a more accurate three-dimensional image.

S202: and adjusting the constructed global three-dimensional image by using the optimized camera pose sequence to obtain an adjusted three-dimensional image.

In an embodiment, when the difference between the optimized camera pose sequence and the extracted feature information of the plane is not large compared with that before optimization, the three-dimensional image can be adjusted based on the difference before and after optimization to obtain an adjusted three-dimensional image.

In another embodiment, when the optimized camera pose sequence and the extracted feature information of the plane have more errors than before the optimization, a new three-dimensional image can be obtained by re-fusion directly according to the optimized camera pose, the extracted feature information of the plane and the point position information forming the extracted plane.

Based on the embodiment shown in fig. 2, further optimizing the camera pose sequence and the extracted planar feature information when acquiring a multi-frame depth image can better avoid acquiring a relatively inaccurate three-dimensional image due to accumulated registration errors or other accidental errors. The embodiment shown in fig. 1 displays the acquired depth image for the user to determine whether the frame image is a key frame image or a normal frame image based on whether a new ground, a wall surface, a roof, or the like appears in the image, further extracts a plane included in each frame of depth image based on the key frame and the normal frame, and extracts feature information of the extracted plane while extracting the plane. And finally, registering at least a part of adjacent frame depth images of the acquired multi-frame depth images based on the characteristic information of the plane, fusing the obtained registration results, and constructing to obtain a three-dimensional image. A man-machine interaction mode is defined in the construction of the three-dimensional image, and a user participates in the judgment of the key frame image, so that the accurate key frame and the plane information contained in the key frame can be well and quickly obtained, and the construction of the more accurate three-dimensional image can be realized.

Please refer to fig. 3, which is a partial schematic flow chart of a three-dimensional image constructing method according to another embodiment of the present application. The embodiment shown in fig. 3 further illustrates the extraction of the first class of planes in the key frame depth image. Wherein the step of extracting the first plane-like plane in the key frame image based on the point selected by the user further comprises steps S301 to S302. Wherein,

s301: in the key frame image, a first preset number of points in a first preset neighborhood of each point selected by a user are extracted to obtain a first point set.

The size of the first neighborhood and the size of the first preset number can be adjusted and set based on actual needs. In an embodiment, the first preset neighborhood of each point selected by the user may be measured by area, for example, a first preset number m of points with the point acquired by the user as a circle center r as a radius are acquired, and if the user selects k key points in the current key frame image, the first point set includes k × m points; in another embodiment, the first predetermined neighborhood of the user-selected point may also be represented by a number, that is, a first predetermined number of points within r × r (r represents the number of points) around the user-selected point is obtained. Wherein the first predetermined number may be all points within r × r around the acquisition point. If the user selects k key points in a certain key frame, k × r × r points are included in the first set.

S302: and fitting to obtain a first class plane based on the position information of the points in the first point set in the camera coordinate system corresponding to the key frame image.

Since the image acquired in step S110 is a depth image including at least position information of points constituting the image, the first class plane can be obtained by further fitting directly based on the position information of the points in the first point set acquired in step S301 in the camera coordinate system corresponding to the key frame image. Wherein the algorithm invoked for fitting comprises at least a least squares method. The first plane is a plane obtained by fitting the points selected by the user in the key frame image through the steps from S301 to S302, and may also be a plane or a partial plane forming the building main body structure. It can be understood that when the first-class plane is obtained through fitting, the characteristic information of the first-class plane obtained through fitting is further obtained. As described above, the feature information of the first plane includes at least: position information of the first plane in a camera coordinate system in the depth image, the number of points contained in the first plane, a distance from the first plane to the camera, center coordinates of the first plane, and the like.

Further, please refer to fig. 4, which is a partial schematic flow chart of a three-dimensional image constructing method according to another embodiment of the present application. In the embodiment shown in fig. 4, the first class plane obtained by fitting in step S401 and step S402 is further optimized (steps S401 and S402 in the current embodiment are the same as steps S301 and S302 in fig. 3, and both are extracted from the first class plane). Please refer to the explanation of step S301 and step S302 in fig. 3 for step S401 and step S402, which is not described herein again.

S401: in the key frame image, a first preset number of points in a first preset neighborhood of each point selected by a user are extracted to obtain a first point set.

S402: and fitting to obtain a first class plane based on the position information of the points in the first point set in the camera coordinate system corresponding to the key frame image.

S403: and taking the first class plane as a reference plane, and finding out points which belong to the reference plane or have a distance difference with the reference plane smaller than or equal to a preset threshold value in the key frame image so as to obtain a second point set.

It is to be understood that in different embodiments the setting and definition for the second set of points is different. In one embodiment, it may be set to find that all points belonging to the reference plane in the reference frame image constitute the second set of points. In another embodiment, all points belonging to the reference plane and all points having a distance from the reference plane smaller than or equal to a preset threshold are set to jointly form the second point set.

The finding of the point belonging to the reference plane can be understood as finding the point where all the position information in the current key frame image can satisfy the equation of the reference plane (i.e. the first class plane obtained by fitting in the above step). If so, the first class plane obtained by fitting in the step S402 is: if Ax + By + Cz + n is 0, then finding the point belonging to the reference plane is to find the point where all the position information in the current key frame can satisfy Ax + By + Cz + n is 0.

And searching all points with the distance difference from the reference plane less than or equal to a preset threshold value, namely, the points in the class can be above the reference plane and below the reference plane, and the distance difference from the reference plane is less than or equal to the preset threshold value. The preset threshold is set based on an empirical value, and may be specifically adjusted according to actual needs, and when the requirement for accuracy is relatively strict, the preset threshold may be further set to be slightly smaller, and the preset threshold is specifically set and adjusted according to actual needs, which is not limited herein.

S404: and re-fitting to obtain a first class plane based on the position information of the points in the second point set in the camera coordinate system corresponding to the key frame image.

In step S403, a second point set formed by all points belonging to the fitted first class plane in the current frame image and position information of the points in the camera coordinate system corresponding to the key frame image are further obtained, and in step S404, the first class plane is obtained by further re-fitting based on the second point set, which may also be understood as performing optimization adjustment on the first class plane obtained by fitting in step S402.

Further, in an embodiment, when the new first-type plane obtained by fitting in step S404 does not meet the preset requirement, the steps as described in S403 and S404 are further repeated to obtain a first-type plane that better meets the preset requirement. Wherein the preset requirements include: the repetition times reach preset times and/or the difference degree between the first class plane and the reference plane is lower than a preset degree value, and the preset times are larger than or equal to zero.

In one embodiment, the predetermined requirement is that the number of repetitions reaches a predetermined number. The repeating number reaching the preset number refers to setting the number of times of repeating steps S403 and S404 to fit a more accurate first-class plane. Taking the first class plane obtained by fitting in step S402 or S404 as a reference plane, finding out a point belonging to the reference plane or having a distance difference from the reference plane smaller than or equal to a preset threshold value from the keyframe image, obtaining a second point set again, re-fitting a new first class plane based on the second point set, determining whether steps S403 and S404 are repeated for a preset number of times, and further determining whether the obtained first class plane is valid when the steps are repeated for the preset number of times (details will be described later); and when the repetition is less than the preset number, judging that the new first-class plane obtained by fitting does not meet the preset requirement, and continuously repeating the steps S403 and S404. It is understood that the preset number may be any number greater than or equal to zero, and if the preset number is zero, the steps S403 and S404 do not need to be repeatedly executed.

In another embodiment, the predetermined requirement is that the degree of difference between the first class plane and the reference plane is lower than a predetermined degree value. After the new first class plane is obtained by fitting in step S404, the new first class plane is further compared with the reference plane of the plane to determine whether the difference between the new first class plane and the reference plane is lower than the preset degree value. It can be understood that the difference degree values between the planes can be compared based on a plane equation obtained by fitting, when the difference between the corresponding coefficients of the two planes is less than or equal to the preset difference value, the difference degree of the two planes is judged to be lower than the preset degree value, the first class of planes is judged to meet the preset requirement, and the validity of the first class of planes obtained by fitting can be further judged (see below); and when the difference value between the first class plane obtained by new fitting and the reference plane is larger than the set preset difference value, judging that the difference procedure of the two planes is larger than the preset degree value, judging that the first class plane obtained by fitting does not meet the preset requirement, and further repeating the step S403 and the step S404.

Further, in an embodiment, after performing step S404 and determining that the preset requirement is met, the method further includes: the validity of the first class plane obtained by fitting in step S404 is determined. The method specifically comprises the following steps: and judging whether the first class plane is valid according to the number of points of the intersection of the first point set and the second point set, reserving the first class plane when the first class plane is valid, and rejecting the first class plane when the first class plane is invalid. The validity of the first class plane refers to judging whether the extracted first class plane is accurate or not. When the first class plane obtained by fitting is judged to be valid, the first class plane obtained by fitting is reserved, and meanwhile, the characteristic information corresponding to the first class plane is reserved; otherwise, when the first-class plane is judged to be invalid, the plane and the corresponding plane feature information thereof are further removed.

In an embodiment, the step of determining whether the first class plane is valid according to the number of points of the intersection of the first point set and the second point set includes: and acquiring the number of the points of the intersection of the first point set and the second point set, and judging whether the ratio of the acquired number of the points to the number of the points of the first point set reaches a preset ratio, wherein if so, the first class plane is effective, otherwise, the first class plane is ineffective. For example, let the number of points in the first set of points be kr²If the number of points of the intersection of the second point set and the first point set is m, the preset ratio is set to 97%, and when m/kr is reached²And when the number is more than or equal to 97%, judging that the first class plane obtained by fitting is valid, keeping the first class plane obtained by fitting, otherwise, rejecting the first class plane obtained by fitting.

Further, please refer to fig. 5, which is a partial schematic flow chart of a three-dimensional image constructing method according to another embodiment of the present application. In the embodiment shown in fig. 5, the step of acquiring the feature information of the first plane further includes step S501 and step S502. Specifically, the method comprises the following steps:

s501: and displaying the first class plane to a user through a display interface, and acquiring label information of the first class plane input by the user.

And outputting the extracted first-class plane to a display interface to be highlighted for a user, and prompting the user to add label information for the output first-class plane. The label information can be a preset label containing preset contents (the preset label comprises a bearing wall, a wall body, a ground, a step, a roof and the like) for a user to select and add; of course, the specific content of the tag information can also be added by the user without limitation.

Further, since the tag has a constraint effect, in some embodiments, after the user adds the tag, the accuracy of the first class plane is checked again based on the tag information. For example, when the user adds a tag that is: the bearing wall is vertical to the ground, so that whether the extracted first class plane is vertical to the ground corresponding to the first class plane is further verified, and when the first class plane is vertical to the ground corresponding to the first class plane, the first class plane is judged to be accurate and effective; otherwise, the first-class plane is judged to be inaccurate or the plane is judged to have errors, and the like, so that the user is prompted whether to reprocess the frame image.

S502: and taking the label information and the plane feature information of the first plane as feature information of the first plane.

And storing the tag information added by the user and the plane characteristic information acquired when the first-class plane is extracted together, and taking the tag information and the plane characteristic information together as the characteristic information of the first-class plane for calling when the plane is registered or the global graph or the partial global graph is fused.

Referring to fig. 6, fig. 6 is a partial schematic flow chart diagram of another embodiment of the present application. Specifically, the embodiment shown in fig. 6 is further described for extracting the second-class plane in the normal frame image or extracting the second-class plane in the key frame image. Namely, the steps are as follows: and extracting a second plane based on a plurality of points in the key frame image except the first plane or extracting the second plane based on a plurality of points in the common frame image, and further elaborating. Wherein,

s601: and extracting a second preset number of points in a second preset neighborhood of each point in the common area to obtain a third point set corresponding to each point in the common area.

The common area is an area except the first-class plane in the key frame image or the whole area of the common frame image. In an embodiment, the second predetermined neighborhood is an area with a size measured by area, that is, a second predetermined number of points in a predetermined area with each point in the common area as a center are extracted, the predetermined area is a second predetermined number of points in the predetermined area with each point as a center and r as a radius, so as to form a third point set, and the second predetermined number can be adjusted and set according to actual needs. In another embodiment, the second preset area may also be an area with a quantity that is measured, a second preset number of points in a preset number range around each point in the common area are extracted, and a third point set is formed, and the second preset number is adjusted and set according to actual needs.

It can be understood that, since the second preset number of points in the second preset area of each point in the normal area are extracted in step S601, at least a plurality of third point sets are extracted in step S601 for fitting the required second plane-like surface.

S602: and performing plane fitting on each third point set by adopting a least square method, and obtaining a normal vector of a fitting plane of a corresponding point in the common area when the plane fitting is successful.

And fitting a plane by using a least square method for the third point set corresponding to each point extracted in the step S601. And further acquiring a normal vector corresponding to the plane for the successfully fitted plane. It can be understood that, for the point corresponding to the third point set that fails to fit the plane, the point corresponding to the third point set that successfully fits the plane is only used to further obtain the second-class plane.

S603: and merging points of the normal vectors in the common area to form a second plane.

The points in the normal region where the normal vector is obtained are the points corresponding to the third point set successfully fitted to the plane in step S602. Since the third point set is a second preset number of points in the preset neighborhood of each point in the common region, the normal vector corresponding to the plane fitted based on the third point set can be regarded as the normal vector of any one point in the third point set. Therefore, the points with the obtained normal vectors are further merged to obtain a second class plane.

Further, the step S603 of merging the points in the normal region where the normal vectors are obtained to combine and form the second plane further includes: and selecting points meeting the homotropism and coplanarity from the points of the normal vector obtained in the common region, and merging the points into a same point set to obtain a second class plane.

Where isotropy (co-normalization) is a measure describing the normal agreement of a plane of fit between two points, defined as α ═ cos^-1(n₁·n₂) Obtaining the angle between the normal vectors n1 and n2 of the two points, and comparing the angle α with the set threshold α_threshComparing, if α is less than the threshold, then judging that the two points are isotropic.

Coplanarity (co-planarity), which describes the proximity of two points in the distance of a plane of fit, is defined as follows:

d＝max(|r₁₂·n₁|,|r₁₂·n₂|)

wherein r is₁₂Represents the distance between these two points, | r₁₂·n₁I and | r₁₂·n₂The distance in the normal vector direction from one point to another point on the plane of the respective points, | may also be understood as the smallest distance from one point to another point on the plane of the respective points, so that the formula d ═ max (| r)₁₂·n₁|,|r₁₂·n₂|) is the distance between the planes at which the two points are solved. It is necessary to set the threshold d_threshIf d is less than the threshold, it is judged that the two points satisfy coplanarity.

And (4) merging the points which simultaneously satisfy the homotropism and the coplanarity into the same set, and finally obtaining a new point set. Because any two points in the same set meet the measurement of the isotropy and the coplanarity, the points in the same set obtained by merging the points based on the isotropy and the coplanarity of the points are all on the same plane. The points on different sets belong to different planes. After the points meeting the homodromous and coplanarity are merged into the same point set to obtain a second plane, the characteristic information of the second plane is further obtained. The acquired feature information of the second plane type at least comprises: the number of points contained in the plane, the distance of the plane from the camera, and the center coordinates of the plane. For fusion registration.

Because the camera can only obtain a part of scenes corresponding to the camera angle each time when obtaining the images of the scenes, in order to construct a relatively complete three-dimensional image of the scenes, the covered scene content of the camera can be changed by adopting rotation and translation, a complete scene graph is obtained, and finally, the amount of each rotation and translation transformation is solved based on the registration between adjacent frames.

Further, please refer to fig. 7, which is a partial schematic flow chart of a three-dimensional image constructing method according to another embodiment of the present application. The embodiment shown in fig. 7 is further described in the registration of adjacent frames of the image in step S140 in the embodiment shown in fig. 1, and it can be seen that the registration of at least part of adjacent frame depth images of multiple frames of depth images by using the extracted feature information of the plane in step S140 further includes: s701 to step S704.

S701: and matching the planes extracted from the depth images of the adjacent frames based on at least two preset feature matching strategies to obtain a plurality of groups of plane pairs matched with feature information. Wherein, the at least two preset feature pairing strategies comprise at least two of the following:

and preferentially pairing planes with the difference between the characteristic information lower than the set difference value. The differences between how the characteristic information is found will be described in detail below.

The first class plane is preferentially paired, and is obtained by obtaining a preset number of point sets according to a set rule based on points selected by a user and finally fitting, so that the accuracy is relatively higher, and the first class plane is also a main part forming a three-dimensional image, and needs to be preferentially paired.

The priority pairing is set because the plane with a large number of points is preferentially paired, and the plane with a large number of points is less likely to have an error than the plane with a small number of points.

Planes near the middle of the image are preferentially paired, and planes near the middle of the image are more accurate, so that the planes can be preferentially paired. In one embodiment, since the pose of the camera of the two adjacent frames of images does not change greatly, the change of the plane features extracted from the two adjacent frames of images is not great. Specifically, the following method is adopted for the pairing of planes in the adjacent frame images, and the paired planes in two continuous frames a (the previous frame) and B (the next frame) are determined: let the extracted plane in the a frame include:k key planes and m common planes are totally arranged; the B frame is a next frame image, and the plane extracted from the B frame comprises:

the difference of feature information (simply referred to as feature difference, and also referred to as registration error) between two planes in the a frame image and the B frame image is calculated by using the following formula:

wherein N is^A，d^A，P^AIs the characteristic of a certain plane of the A frame; n is a radical of^B，d^B，P^BIs the characteristic of a certain plane of the B frame; dis (P)^A,P^B) To express P^AAnd P^BDistance of point coordinates.

S702: and calculating to obtain a rotation matrix and a translation vector between the depth images of the adjacent frames by using a third preset number of plane pairs in the plurality of groups of plane pairs.

In one embodiment, the solving of the rotation matrix between the depth images of adjacent frames is specifically as follows: for each extracted plane, we can describe it in terms of (n, m). Where n is the normal vector of the plane and m is the center of gravity of the set of points belonging to the plane. Selecting the plane pairs corresponding to the third preset number, and recording the third preset number as k (k is set to 3 according to the experience value). N is the normal direction and the gravity center of the ith plane selected from the first group of point clouds (the first group of point clouds refers to the previous frame in two continuous frames)_iAnd m_iThe normal direction and the gravity center of the ith plane in the second group of point clouds (the first group of point clouds refers to the next frame in two continuous frames) corresponding to the point clouds are n_i' and m_i'. Meanwhile, a value is set to be [0,1 ] for each pair of plane features]Weight value w of_iAnd is used to indicate how important the corresponding feature of the group is in the match. Then, it is possible to start calculating a rigid transformation (R, t) between two adjacent images, i.e. a rotation matrix R and a translation vector t.

Since each rigid transformation can be divided into two steps, a rotation is performed first, and then a translation is performed. For a direction vector, the normal vector, translation has no effect on it. Therefore, the calculation of the rigid transformation is divided into two steps, firstly, the normal vector of the corresponding plane feature is used for solving R, and then the gravity center is combined for solving t.

Ideally, for any ith set of corresponding pairs of planar features, there should be:

n_i'-Rn_i＝0

therefore, we solve for R by minimizing the following.

The solution is performed by using a unit quaternion method.

If the unit quaternion corresponding to R is q, then:

wherein p is a three-dimensional vector,is the conjugate of q, and Λ represents the multiplication of two quaternions. Thus, there are:

where Ai is a 4 x 4 antisymmetric matrix:

and for a three-dimensional vector v ═ (x, y, z)^T

Here, let

Then, a is a symmetric matrix. And the above equation can be finally:

E＝q^TAq

and q is a unit quaternion, so there is a constraint of q^Tq is 1. Using the lagrange multiplier method, q that minimizes E should satisfy:

namely, it is

Thus, there are:

Aq＝λq

it can be seen that q is a feature vector of a, and the formula is substituted back to E ═ q^TAq of

E＝λ

Therefore, the minimum value that E can take is actually the minimum eigenvalue of a, and in this case, q should take the eigenvector corresponding to this minimum eigenvalue.

By calculating q, the corresponding rotation transformation matrix R can be conveniently obtained.

Further, the translation vector t needs to be solved next. The solution to t is simple, assuming the previously calculated rotation matrix asWe find t by minimizing the following.

Directly deriving t and letting the derivative be 0, we can:

and solving the linear equation set to obtain a translation vector t. This results in a rigid transformation (R, t).

S703: and registering the depth images of the adjacent frames based on the rotation matrix and the translation vector, and calculating a registration error of each group of plane pairs by referring to a registration result.

After obtaining rigid transformation (R, t) between two adjacent frames, further performing rigid transformation (R, t) on the point cloud in the previous frame of image or the point cloud position information of the plane extracted from the previous frame of image, further registering the coordinate information of the point cloud in the previous frame obtained after transformation and the point cloud information in the current frame, or registering the point cloud information of the plane extracted from the previous frame obtained after transformation and the position information of the plane point cloud extracted from the current frame matched with the point cloud information of the plane extracted from the previous frame obtained after transformation, and keeping the information of the point cloud which is successfully registered so as to form a three-dimensional image.

And after the registration of the depth images of the adjacent frames is completed based on the rotation matrix and the translation vector, further referring to the registration result, and calculating the registration error of each group of plane pairs. For the registration error, see the difference of the planar feature information in the above.

S704: and selecting the plane pair with the minimum registration error as a matched plane pair of the depth images of the adjacent frames in a plurality of groups of plane pairs comprising the same plane.

As can be known from step S701, at least two preset feature matching strategies are used to match planes extracted from depth images of adjacent frames, so that matching between the same plane and different planes is successful under different matching strategies. Therefore, a plurality of sets of plane pairs which are obtained by different pairing strategies and include the same plane need to be compared and selected further, the plane pair with the minimum registration error under different pairing strategies is used as a matched plane pair of two adjacent frames of depth images, and the accuracy of the constructed three-dimensional image can be further ensured.

Further, in an embodiment, after step S704, the method of this embodiment further includes: if one plane in the matched plane pair is the first plane and the other plane is the second plane, updating the other plane to be the first plane. For example, a first plane in the previous frame and a second plane in the next frame are matching planes, so the second plane in the next frame is updated to the first plane.

It should be noted that, in the plane registration in the images of the adjacent frames, at least more than a certain number of planes which are not parallel to each other in the two adjacent frames may be set as features, so that the registration of the adjacent frames can be completed. However, if the number of extracted planes in one frame image is insufficient in the steps described above, the registration may be further completed based on the extraction and matching of feature points of adjacent frame images, and details of the extraction and matching of feature points are not described in detail.

Please refer to fig. 8, which is a schematic structural diagram of an embodiment of an apparatus 800 for three-dimensional image construction according to the present application. The apparatus 800 for three-dimensional image construction includes: a processor 801, and a memory 802 coupled to the processor 801. The memory 802 stores program data, and the processor 801 executes a method of constructing a three-dimensional image when running the program data, so as to implement the corresponding situation of each embodiment corresponding to fig. 1 to 7.

Further, referring to fig. 8 and fig. 9, the apparatus for three-dimensional image construction further includes: a human interface component 803 or a human interaction component 901.

The human interface component 803 is used for outputting the depth image to an external display device and inputting information of a point in the depth image selected by a user, which is generated by the external input device.

The human-computer interaction component 901 is used to display the depth image, and the resulting information of the points in the depth image selected by the user. In one embodiment, the human-computer interaction component 901 shown in fig. 9 is a touch display screen.

Further, please refer to fig. 10, which is a schematic structural diagram of another embodiment of the apparatus 1000 for three-dimensional image construction according to the present application, and it can be seen that in the present embodiment, the apparatus 1000 for three-dimensional image construction further includes: a communication interface 1001. The communication interface 1001 is used for coupling with a camera 1010 outside the device to obtain a depth image captured by the camera 1010.

Further, please refer to fig. 11, which is a schematic structural diagram of an embodiment of a three-dimensional image constructing apparatus according to the present application. In the present embodiment, the apparatus 1100 for three-dimensional image construction provided by the present application includes, in addition to the circuit shown in fig. 8, a shooting component 1101 for shooting a scene to obtain depth image data of the scene.

The shooting device and the shooting component can be specifically the camera in the method embodiment.

Referring to fig. 12, the present application also provides a device 1200 with a storage function. The apparatus stores program data that, when executed, implements the method of three-dimensional image construction described above and the methods described in the various embodiments. Specifically, the apparatus 1200 with a storage function may be one of a memory, a personal computer, a server, a network device, or a usb disk.

The above description is only an embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A method of three-dimensional image construction, the method comprising:

extracting a plane in each frame of depth image, and acquiring feature information of the extracted plane; wherein,

for the key frame image, extracting a first class plane based on the points selected by the user and acquiring the characteristic information of the first class plane, and extracting a second class plane based on a plurality of points in the key frame image except the first class plane and acquiring the characteristic information of the second class plane;

and fusing the registration result to construct a three-dimensional image.

2. The method of three-dimensional image construction according to claim 1, wherein the step of extracting a first plane type in the keyframe image based on the user-selected point further comprises:

extracting a first preset number of points in a first preset neighborhood of each point selected by the user from the key frame image to obtain a first point set;

and fitting to obtain the first class plane based on the position information of the points in the first point set in the camera coordinate system corresponding to the key frame image.

3. The method of claim 2, wherein after the step of fitting the first class plane based on the position information of the points in the first set of points in the camera coordinate system corresponding to the keyframe image, further comprises:

taking the first class plane as a reference plane, and finding out points which belong to the reference plane or have a distance difference with the reference plane smaller than or equal to a preset threshold value in the key frame image to obtain a second point set;

based on the position information of the points in the second point set in the camera coordinate system corresponding to the key frame image, re-fitting to obtain the first class plane;

if the current state does not meet the preset requirement, repeating the steps until the preset requirement is met, wherein the preset requirement is that the repetition frequency reaches the preset frequency and/or the difference degree between the first class plane and the reference plane is lower than a preset degree value, and the preset frequency is greater than or equal to zero.

4. The method of three-dimensional image construction according to claim 3, wherein after meeting the preset requirements, the method further comprises:

and judging whether the first class plane is effective or not according to the number of points of the intersection of the first point set and the second point set, reserving the first class plane when the first class plane is effective, and rejecting the first class plane when the first class plane is ineffective.

5. The method of claim 4, wherein the step of determining whether the first class plane is valid according to the number of points at the intersection of the first point set and the second point set comprises:

and acquiring the number of points of the intersection of the first point set and the second point set, and judging whether the ratio of the acquired number of points to the number of points of the first point set reaches a preset ratio, wherein if the ratio of the acquired number of points to the number of points of the first point set reaches the preset ratio, the first class plane is effective, and otherwise, the first class plane is invalid.

6. The method of three-dimensional image construction according to claim 1, wherein the step of obtaining feature information of the first plane includes:

displaying the first class plane to a user through a display interface, and acquiring label information of the first class plane input by the user;

and taking the label information and the plane feature information of the first plane as feature information of the first plane.

7. The method of claim 1, wherein the step of extracting a second plane type based on points in the keyframe image other than the first plane type or the step of extracting a second plane type based on points in the keyframe image further comprises:

extracting a second preset number of points in a second preset neighborhood of each point in a common area to obtain a third point set corresponding to each point in the common area, wherein the common area is the key frame image except the first class plane or the whole area of the common frame image;

performing plane fitting on each third point set by adopting a least square method, and obtaining a normal vector of a fitting plane of a corresponding point in the common area when the plane fitting is successful;

and merging the points of the normal vector obtained in the common area to form a second plane.

8. The method of three-dimensional image construction according to claim 7, wherein the step of merging the points in the general area where the normal vectors are obtained to combine to form a second plane-like plane further comprises:

and selecting points meeting the homotropism and coplanarity from the points of the normal vector obtained in the common area, and merging the points into a same point set to obtain the second class plane.

9. The method of three-dimensional image construction according to claim 1, wherein the step of registering at least some of the depth images of the neighboring frames of the plurality of frames of depth images using the feature information of the extracted plane further comprises:

matching the extracted planes in the depth images of the adjacent frames based on at least two preset feature matching strategies to obtain a plurality of groups of plane pairs matched with feature information;

calculating to obtain a rotation matrix and a translation vector between the adjacent frame depth images by using a third preset number of plane pairs in the plurality of groups of plane pairs;

registering the depth images of the adjacent frames based on the rotation matrix and the translation vector, and calculating a registration error of each group of the plane pairs by referring to the registration result;

selecting, among a plurality of sets of the plane pairs including the same plane, a plane pair having a smallest registration error as a matching plane pair of the adjacent frame depth images.

10. The method of three-dimensional image construction according to claim 9, wherein after the step of selecting a plane pair with a minimum registration error as a matching plane pair of the adjacent frame depth images among a plurality of sets of the plane pairs including the same plane, the method further comprises:

if one plane in the matched plane pair is a first plane and the other plane is a second plane, updating the other plane to be the first plane;

and/or the at least two preset feature pairing strategies comprise at least two of:

preferentially pairing planes with the difference between the characteristic information lower than a set difference value;

preferentially pairing the first class planes;

preferentially pairing planes with a large number of points;

planes near the middle of the image are preferentially paired.

11. The method of three-dimensional image construction according to claim 1, wherein after the step of fusing the results of the registration to construct a three-dimensional image, the method further comprises:

calling a preset algorithm, and optimizing a camera pose sequence and the extracted plane characteristic information when the multi-frame depth images are collected;

and adjusting the constructed three-dimensional image by utilizing the optimized camera pose sequence to obtain the adjusted three-dimensional image.

12. An apparatus for three-dimensional image construction, the apparatus comprising: a processor, and a memory coupled to the processor, the apparatus further comprising a human interface component or a human interaction component; the memory stores program data which, when executed by the processor, is operable to perform the method of any of claims 1 to 11; the man-machine interface component is used for outputting the depth image to an external display device and inputting information of a point in the depth image selected by a user and generated by an external input device; the human-computer interaction component is used for displaying the depth image and generating information of points in the depth image selected by a user.

13. The apparatus for three-dimensional image construction according to claim 12, wherein the human-computer interaction component is a touch display screen;

the device further comprises:

the communication interface is used for being coupled with a shooting device so as to acquire depth image data shot by the shooting device; or the shooting component is used for shooting the obtained depth image data.

14. An apparatus having a storage function, wherein the apparatus stores program data which, when executed, implements the method of any one of claims 1 to 11.