CN113034675A - Scene model construction method, intelligent terminal and computer readable storage medium - Google Patents
Scene model construction method, intelligent terminal and computer readable storage medium Download PDFInfo
- Publication number
- CN113034675A CN113034675A CN202110325406.7A CN202110325406A CN113034675A CN 113034675 A CN113034675 A CN 113034675A CN 202110325406 A CN202110325406 A CN 202110325406A CN 113034675 A CN113034675 A CN 113034675A
- Authority
- CN
- China
- Prior art keywords
- voxel
- scene
- scene model
- nth
- neighborhood
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010276 construction Methods 0.000 title claims abstract description 21
- 238000000034 method Methods 0.000 claims abstract description 44
- 238000000605 extraction Methods 0.000 claims abstract description 13
- 230000004927 fusion Effects 0.000 claims abstract description 13
- 238000012549 training Methods 0.000 claims description 71
- 238000007637 random forest analysis Methods 0.000 claims description 30
- 238000003066 decision tree Methods 0.000 claims description 23
- 239000013598 vector Substances 0.000 claims description 20
- 238000001914 filtration Methods 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 13
- 230000009467 reduction Effects 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 12
- 238000012216 screening Methods 0.000 claims description 10
- 238000005070 sampling Methods 0.000 claims description 5
- 230000002146 bilateral effect Effects 0.000 description 8
- 238000004422 calculation algorithm Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 5
- 230000003190 augmentative effect Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000008030 elimination Effects 0.000 description 2
- 238000003379 elimination reaction Methods 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a scene model construction method, an intelligent terminal and a computer readable storage medium, wherein the method comprises the following steps: when an Nth original depth image aiming at the same scene is obtained, scene fusion is carried out on the Nth original scene model according to the Nth original depth image to obtain an Nth intermediate scene model; extracting neighborhood characteristics of each voxel in the Nth intermediate scene model according to a preset extraction rule; calculating a voxel prediction value corresponding to each voxel according to the neighborhood characteristics; and for each voxel, when the voxel value corresponding to the voxel in the Nth original scene model is a non-voxel observation value, updating the Nth original scene model according to the voxel prediction value corresponding to the voxel to obtain the (N +1) th original scene model. According to the method, the prediction of the voxel value of the scene model is realized through the neighborhood characteristics of the voxel, so that the cavities of the subsequent model are reduced, and the integrity of the model is improved.
Description
Technology neighborhood
The present invention relates to a scene model construction neighborhood, and in particular, to a scene model construction method, an intelligent terminal, and a computer-readable storage medium.
Background
RGB is a color model, and an RGB image represents an image including three color channels of red, green, and blue. Whereas in 3D computer graphics, a Depth Map (Depth Map) is an image or image channel containing information about the distance of the surface of the scene object from the viewpoint. In the depth map, each pixel value represents a shooting distance from an object. Usually, the RGB image and the depth map are matched with each other, and therefore, a one-to-one relationship exists between pixel points of the RGB image and the depth map. And because the depth map can represent the distance between the object and the depth map, the reconstruction of the scene can be realized by combining the RGB image and the depth map.
The current scene reconstruction method mainly uses a depth camera to surround a scene so as to acquire different depth maps. And then converting the depth map into three-dimensional point clouds according to the internal parameters of the camera and the pixel values in the depth map, and calculating a normal vector corresponding to each point cloud. For example, an Iterative Closest Point (ICP) algorithm is used to iteratively minimize the distance from a Point to a plane, and then pose transformation between two frames is calculated, thereby solving the camera pose of the current frame. And fusing the point cloud to a Truncated Signed Distance Function (TSDF) model according to the camera posture of the current frame, obtaining the TSDF model surface, the point cloud and a normal vector under the current camera view angle according to a reprojection algorithm, and then carrying out ICP iteration on the TSDF model surface, the point cloud and the normal vector and the next frame of data to solve the next frame posture.
However, when the current depth camera collects the distance of an object, Time of Flight (TOF), binocular shooting and structured light techniques are mainly adopted, so that a blocked area often cannot be reconstructed because depth data cannot be collected. Meanwhile, objects are often shielded from each other in a shooting scene, so that in practice, it is very difficult to ensure that a depth camera scans all points capable of covering indoor objects, and a large number of cavities appear in a model reconstructed based on a current three-dimensional reconstruction method. Taking an augmented reality system as an example, based on the three-dimensional model reconstruction of the augmented reality system, a scene is rapidly scanned in a local region generally, a local three-dimensional model of the scene is obtained, and then the effect of virtual-real rendering is realized by using the model. However, because the obtained model is usually not complete enough, an area which is not reconstructed is inevitably observed in the moving process of the camera, and at this time, a correct virtual-real occlusion relationship cannot be rendered in the augmented reality application.
Disclosure of Invention
The invention mainly aims to provide a scene model construction method, an intelligent terminal and a computer readable storage medium, and aims to solve the problem that the relation between objects cannot be accurately reflected by a model due to the fact that holes are easy to appear in the existing three-dimensional reconstruction technology.
In order to achieve the above object, the present invention provides a scene model construction method, including the steps of:
when an Nth original depth image aiming at the same scene is obtained, carrying out scene fusion on an Nth original scene model according to the Nth original depth image to obtain an Nth intermediate scene model, wherein N is a natural number which is less than or equal to the total number of the original depth images, and when N is equal to 1, the first original scene model is a preset blank scene model;
extracting neighborhood characteristics of each voxel in the Nth intermediate scene model according to a preset extraction rule;
calculating a voxel prediction value corresponding to each voxel according to the neighborhood characteristics;
and for each voxel, when the voxel value corresponding to the voxel in the Nth original scene model is a non-voxel observation value, updating the Nth original scene model according to the voxel prediction value corresponding to the voxel to obtain an (N +1) th original scene model.
Optionally, the method for constructing a scene model, where, when an nth original depth image for a same scene is obtained, scene fusion is performed on the nth original scene model according to the nth original depth image to obtain an nth intermediate scene model, specifically includes:
when an Nth original depth image aiming at the same scene is obtained, filtering the Nth original depth image to generate an Nth noise reduction depth image;
calculating point clouds corresponding to all pixel points in the Nth noise reduction depth image according to camera internal parameters corresponding to the Nth original depth image to obtain a plurality of point clouds;
aiming at a pixel in each N-th noise reduction depth image, determining a normal vector of a point cloud corresponding to the pixel according to a neighborhood point cloud corresponding to the pixel, wherein the neighborhood point cloud corresponds to the neighborhood pixel of the pixel;
and carrying out scene fusion on each voxel in the Nth scene model according to the normal vector of each point cloud to obtain an Nth intermediate scene model.
Optionally, in the method for constructing a scene model, for each voxel, a neighborhood characteristic corresponding to the voxel includes a scene characteristic of a neighborhood voxel of the voxel, where the neighborhood voxel is a voxel corresponding to a neighborhood pixel of a pixel point corresponding to the voxel.
Optionally, the scene model construction method, wherein the extracting, according to a preset extraction rule, neighborhood features of each voxel in the nth intermediate scene model specifically includes:
extracting scene features of each voxel in the Nth intermediate scene model according to a preset extraction rule;
aiming at each voxel, screening neighborhood voxels according to a preset screening rule and coordinates of the neighborhood voxels corresponding to the voxel to obtain a target voxel;
and taking the scene characteristic corresponding to the target voxel as the neighborhood characteristic corresponding to the voxel.
Optionally, the method for constructing a scene model, where the calculating a voxel prediction value corresponding to each voxel according to the neighborhood characteristics specifically includes:
for each voxel, inputting the neighborhood characteristics corresponding to the voxel into a trained structured random forest model, and predicting the voxel value through the structured random forest model according to the input neighborhood characteristics to obtain a plurality of initial predicted values corresponding to the voxel;
calculating an error value corresponding to each initial predicted value according to a preset error loss function;
determining a number of intermediate predicted values of the initial predicted values according to the error value;
and calculating a voxel prediction value corresponding to the voxel according to the intermediate prediction value.
Optionally, the scene model construction method, wherein the training process of the structured random forest model includes:
acquiring training depth images of different scenes;
according to a preset sampling rule, screening pixel points in the training depth image to obtain training pixel points;
aiming at each training pixel point, taking a voxel value corresponding to the pixel point as label data, and taking a neighborhood voxel value corresponding to the pixel point as training data;
inputting the training data into a preset structured random forest model, and calculating a corresponding training predicted value according to the training data through the structured random forest model;
and according to the training predicted value and the label data, carrying out parameter adjustment on the structured random forest model until the structured random forest model is converged.
Optionally, in the method for constructing a scene model, the structured random forest model includes a plurality of decision trees; the method for calculating the training prediction value comprises the following steps of inputting the training data into a preset structured random forest model, and calculating the corresponding training prediction value according to the training data through the structured random forest model, wherein the method specifically comprises the following steps:
generating a plurality of training subsets according to the label data and the training data, wherein the number of the training subsets is the same as that of the decision trees;
inputting training data in the training subset into each decision tree aiming at each decision tree, and performing dimensionality reduction and clustering on the training data by each node of the decision tree to obtain a main component value corresponding to the training data;
and determining child nodes corresponding to the training data according to the main component values until leaf nodes are reached, and obtaining a prediction training value corresponding to the decision number.
Optionally, the method for constructing a scene model, where the calculating a voxel prediction value corresponding to the voxel according to the intermediate prediction value specifically includes:
aiming at each intermediate predicted value, determining a weight value corresponding to the intermediate predicted value according to an error value corresponding to the intermediate predicted value;
and calculating the voxel predicted value corresponding to the voxel according to the weight value corresponding to each intermediate predicted value.
In addition, to achieve the above object, the present invention further provides an intelligent terminal, wherein the intelligent terminal includes: a memory, a processor and a scene model builder stored on the memory and executable on the processor, the scene model builder when executed by the processor implementing the steps of the scene model building method as described above.
Further, to achieve the above object, the present invention also provides a computer-readable storage medium, wherein the computer-readable storage medium stores a scene model construction program, and the scene model construction program, when executed by a processor, implements the steps of the scene model construction method as described above.
The invention provides a scene model construction method, an intelligent terminal and a computer readable storage medium. And then calculating a voxel predicted value corresponding to each voxel based on the neighborhood characteristics of each voxel. And on the basis of obtaining each voxel predicted value, fusing the voxel predicted value with the Nth original scene model to obtain an N +1 th original scene model. And continuously updating and optimizing the original scene model in the continuous scanning process, and when the voxel value of the voxel is not directly obtained, predicting the corresponding voxel according to the neighborhood characteristic of the voxel so as to fill up the cavity area in the initial model, reduce the number of cavities and increase the integrity of the model.
Drawings
FIG. 1 is a flow chart of a preferred embodiment provided by the scene model construction method of the present invention;
FIG. 2 is a schematic diagram of a TSDF model according to a preferred embodiment of the present invention;
FIG. 3 is a view showing a scene model obtained by voxel prediction according to a preferred embodiment of the present invention;
fig. 4 is a schematic operating environment diagram of an intelligent terminal according to a preferred embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The method for constructing the scene model according to the preferred embodiment of the invention can be executed by an intelligent terminal, and the intelligent terminal comprises a smart phone, a virtual reality technology terminal and other terminals. In this embodiment, a smart phone is taken as an example to describe a scene model building process. As shown in fig. 1, the scene model construction method includes the following steps:
step S100, when an Nth original depth image aiming at the same scene is obtained, scene fusion is carried out on an Nth original scene model according to the Nth original depth image, and an Nth intermediate scene model is obtained.
Specifically, the depth image may be obtained by shooting with a depth camera, a binocular camera, or the like. Taking the depth camera as an example, the depth camera obtains the distance between the object in the environment and the camera through a time-of-flight technology, a laser scanning technology, and the like, so as to obtain a depth image. Since the depth camera is in operation, it is common to take multiple angles for the agreed scene to obtain multiple depth images.
In this embodiment, taking a certain indoor scene as an example, when a plurality of depth images are captured in the indoor scene, the depth images are taken as original depth images, and the original depth images are sequentially acquired. And when the Nth original depth image is obtained, carrying out scene fusion on the corresponding Nth original scene model according to the Nth original depth image to obtain an Nth intermediate scene model.
In this embodiment, N is a natural number less than or equal to the total number of original depth images, and this embodiment will be described by taking the case where N is equal to 1 as an example. When N is equal to 1, the first original scene model is a preset blank scene model. The scene model adopted in this embodiment is a TSDF model, each voxel in the TSDF model corresponds to a pixel point of the original depth image, and the scene model corresponding to the original depth image is completed by calculating a voxel value of each voxel. When N is equal to 1, the TSDF model corresponding to the first original depth image is a scene model in which a voxel value of each voxel is empty. As shown in fig. 2, when N is greater than one, the nth original scene model may also be a scene model in which voxel values of partial voxels are already known.
Further, in the process of fusing the first original depth image and the first original scene model, the adopted fusion mode can be realized by a KinectFusion algorithm, a kininuous algorithm, an elastic reconstruction offline scene model construction algorithm and the like. The fusion process comprises the following steps:
filtering the Nth original depth image to generate an Nth noise reduction depth image;
calculating point clouds corresponding to all pixel points in the Nth noise reduction depth image according to camera internal parameters corresponding to the Nth original depth image to obtain a plurality of point clouds;
aiming at each pixel in the N noise reduction depth image, determining a normal vector of a point cloud corresponding to the pixel according to a neighborhood point cloud corresponding to the pixel;
and carrying out scene fusion on each voxel in the Nth scene model according to the normal vector of each point cloud to obtain an Nth intermediate scene model.
Specifically, each pixel point in the first original depth image is filtered, and the selected filtering mode may adopt bilateral filtering, smooth filtering, and the like. Bilateral filtering is a non-linear filtering that combines two gaussian filters. The bilateral filtering comprises a pixel space Euclidean distance kernel function and a pixel depth value difference kernel function, when a pixel point is located in an internal flat area with small depth value change, the change of the depth value is close to 0, the weight of the pixel depth value difference kernel function approaches to 1, and at the moment, the Euclidean distance kernel function plays a main role in bilateral filtering, namely Gaussian fuzzy processing is conducted on an original image. When the pixel point is located in the edge area with large depth value change, the change of the depth value is large, the weight of the pixel depth value difference kernel function is increased, and even if the weight of the Euclidean distance kernel function in the pixel space is small, the geometric edge area of an object in the image can be ensured not to be fuzzified. The filtering mode adopted by the embodiment is bilateral rapid filtering, the bilateral rapid filtering is rapid and simple, and the bilateral rapid filtering can be realized at a weak augmented reality terminal. Taking a pixel point p in the first original depth image as an example, an equation for performing bilateral fast filtering on the pixel point is as follows: d (p) 1/s ∑q∈Nc(||p-q||2)s(||d0(q)-d0(p)||2)·d0(p)。
D (p) represents the depth value of a pixel point p in the depth map, N represents the set of all pixel points of which the surrounding area of the point p may influence the value, and the pixel points are also called as neighborhood points of the point p; s represents the number of pixel points in the set N; d0And (p) is the depth value of the pixel point p after filtering. c (| | p-q | | non-luminous flux)2) Representing the geometric proximity, s (| | d), of the pixel point p and its neighborhood point0(q)-d0(p)||2) The similarity between the pixel point p and the filtered depth value of the neighborhood point is measured.
Based on the filtering processing, the first original depth image can be denoised to obtain a first denoised depth image.
The depth map provides the z-coordinate in the camera coordinate system, i.e. the distance in space between the camera and the object corresponding to each pixel point of the camera. After the first noise-reduction depth image is obtained, calculating a point cloud mapping map under a camera local coordinate system according to the first noise-reduction depth image by using internal parameters of the depth camera, and if the converted pixel point coordinate is (u, v), performing a conversion formula as follows:
wherein, KdIs the camera parameter of the depth camera, T is the transpose matrix, Dk(u, v) is the depth value of the pixel,namely, the spatial coordinates corresponding to the pixel points, and each spatial coordinate corresponds to a point cloud. And converting the coordinates of each pixel point in the first noise-reduction depth image to obtain a plurality of point clouds.
And after the space coordinates are obtained, aiming at each pixel point, calculating a corresponding normal vector according to the three-dimensional coordinates of the neighborhood pixels of the pixel point. The three-dimensional coordinates of the neighborhood pixels are included in the shot image, the pixel points of the coordinates adjacent to the pixel points are in the same direction as the vertical direction, the horizontal direction and the like of the pixel points, and the pixel points corresponding to the coordinates of the directly adjacent pixels can be selected as the adjacent pixels. When the coordinates of the pixel point in the first noise-reduced depth image are (u, v), the normal vector formula adopted in this embodiment is:
wherein normalfze refers to normalization,is the space vector of the point cloud corresponding to the pixel point,andis the spatial vector of the point cloud corresponding to the neighborhood pixels of the pixel point,the length of the point cloud corresponding to the pixel point is a normal vector of one. Further, if the three-dimensional coordinates corresponding to the pixel points are invalid or the three-dimensional coordinates corresponding to the adjacent pixel points are invalid, the normal vectors corresponding to the pixel points are regarded as external points to be removed.
And then based on the normal vector of each point cloud, carrying out scene fusion by means of a Kinectfusion algorithm and the like. This process is prior art and is not described herein.
And S200, extracting the neighborhood characteristics of each voxel in the Nth intermediate scene model according to a preset extraction rule.
Specifically, an extraction rule is preset, and the extraction rule is mainly used for extracting the neighborhood characteristics of each voxel according to the attributes of the neighborhood voxels of the voxel, including distance characteristics, voxel value characteristics and the like. For each voxel, the neighborhood characteristic corresponding to the voxel comprises the scene characteristic of the neighborhood voxel of the voxel, and the neighborhood voxel is a voxel corresponding to a neighborhood pixel of the pixel point corresponding to the voxel.
In addition, in order to reduce unnecessary computation and increase computation speed, the neighborhood features extracted according to the extraction rule in this embodiment are features for performing certain screening on neighborhood voxels. Firstly, extracting scene characteristics of each voxel in the Nth intermediate scene model according to a preset extraction rule, wherein the scene characteristics refer to characteristics related to the scene model, such as a voxel value and a distance characteristic corresponding to the voxel. And then, aiming at each voxel, screening the neighborhood voxels according to a preset screening rule and the coordinates of the neighborhood voxels corresponding to the voxel to obtain a target voxel. The screening rule is mainly used for eliminating voxels with poor correlation with the voxel value of the voxel in the neighborhood. In this embodiment, the target voxel is a fixed-size cube, the x-axis of which coincides with the normal vector direction of the voxel, and the z-axis of which coincides with the z-axis direction of the world coordinate system. And finally, taking the scene characteristic corresponding to the target voxel as the neighborhood characteristic corresponding to the voxel.
Further, since the most critical of the voxels whose voxel values are to be updated subsequently is the voxel located on the scene surface, in this embodiment, it is preferable to select the voxel corresponding to each vertex in the vertex map in which the virtual camera is re-projected by the first original scene model without calculating the neighborhood feature of each voxel, and only the voxel corresponding to each vertex is needed to be calculated when the voxel prediction value is calculated subsequently, so as to reduce the amount of calculation and improve the calculation efficiency.
And step S300, calculating a voxel prediction value corresponding to each voxel according to the neighborhood characteristics.
Specifically, for each voxel, a voxel prediction value corresponding to the voxel is calculated according to a neighborhood characteristic corresponding to the voxel. The voxel prediction value is a numerical value for predicting the voxel value of the voxel according to the neighborhood characteristic corresponding to the voxel.
In this embodiment, the prediction mode may be implemented by deep learning, machine learning, or the like. In this embodiment, a structured random forest model is used to calculate a voxel prediction value. The specific process comprises the following steps:
and A10, inputting the neighborhood characteristics corresponding to each voxel into the trained structured random forest model, and predicting the voxel value through the structured random forest model according to the input neighborhood characteristics to obtain a plurality of initial predicted values corresponding to the voxel.
Specifically, for each voxel, it may also be a voxel corresponding to a vertex, and the neighborhood characteristics of the voxel are input into the trained structured random forest model. And each decision tree in the structured random forest model calculates an initial prediction value corresponding to the decision tree according to the input neighborhood characteristics. Since the structured random forest model typically comprises several decision trees, several initial prediction models are available.
Further, when the structured random forest model is trained, a large number of training depth images are obtained first, and training pixel points in the training depth images are screened. And then aiming at each training pixel point, taking a voxel value corresponding to the pixel point as label data, taking a neighborhood voxel value corresponding to the pixel point as training data, inputting the training data into a preset structured random forest model, and calculating a corresponding training predicted value according to the training data through the structured random forest model.
Because each decision tree predicts the predicted voxel value of a voxel according to the scene characteristics of the neighborhood voxels of one voxel, in the vertex image, two adjacent points usually have similar effect on prediction, and therefore, the voxel value prediction of the voxel corresponding to each pixel point in each training depth image is not required. Training data that is helpful for prediction can be selected by a preset sampling rule. In this embodiment, pixel points of which the corresponding normal vectors are far away from the depth camera and pixel points of which the normal vectors are the same as the positive direction of the z axis of the world coordinate system are taken as elimination points, and scene coordinates of voxels corresponding to the elimination points are eliminated from neighborhood features corresponding to the voxels, so as to obtain training data.
When the decision tree is trained, the label data and the corresponding training data are used as a data pair, and for each decision tree, a plurality of data pairs are selectively selected from all the data pairs to be used as the training subsets corresponding to the decision tree, so that a plurality of training subsets are obtained.
The present embodiment describes a process of training a decision tree by taking a training data as an example. And inputting the training data into a root node of the decision tree aiming at each decision tree, classifying layer by layer from the root node until reaching a leaf node, and classifying the characteristics of each node according to numerical values in the training data so that the characteristics of the sub-nodes are similar as much as possible, namely the characteristics of the neighborhoods are similar as much as possible. When the classification is performed, the embodiment firstly adopts a random sampling mode to randomly sample the dimensionality of three dimensionalities in the training data, so that the dimensionality reduction of the data is realized, and then the data is divided into two temporary classes through a clustering mode. The dimensionality reduction and clustering can be performed in a random sampling mode to obtain two principal component values. Training comfort is then assigned to the most appropriate one of the two child nodes based on the sign of the largest numerical primary component value. At the last leaf node, a predictive training value is output. And then, according to the predicted training value and label data corresponding to the training data, parameter adjustment is carried out on each node of the decision tree until the training of all the decision trees is completed, and the structured random forest model completes convergence, namely the training is completed.
And A20, calculating an error value corresponding to each initial predicted value according to a preset error loss function.
Specifically, since the calculation results of each decision tree are different and there are some results that have a large distance from the actual geometric structure, an error value corresponding to each initial prediction value is calculated through a preset error loss function, so that a part of initial prediction values with large errors are removed. In this embodiment, a three-dimensional point cloud corresponding to the first original depth image is calculated by reprojection, and a three-dimensional point cloud falling in a neighborhood voxel is found, and an error loss function is adopted as follows:wherein E is an error value, and E is an error value,the three-dimensional point cloud falling into the neighborhood voxel is represented by p, and the initial predicted value is represented by y (p).
A30, determining a plurality of intermediate predicted values in the initial predicted values according to the error value.
Specifically, an intermediate prediction value of the initial prediction values is determined according to the error value. The determination method may be that the error values corresponding to the initial predicted values are sorted according to the sizes of the error values to obtain an error sequence, and then the initial predicted values with the same number as the selected number in the initial predicted values are sequentially selected as intermediate predicted values according to a preset selected number, for example, the number is 2. And a preset error value threshold value can be adopted, and the initial predicted value corresponding to the error value smaller than the threshold value is used as an intermediate predicted value.
And A40, calculating a voxel predicted value corresponding to the voxel according to the intermediate predicted value.
Specifically, after the intermediate prediction values are obtained, the number of the intermediate prediction values may also be greater than one, so that the corresponding voxel prediction values need to be calculated according to the intermediate prediction values. The manner used may be a weighted average. The weighted value added to the intermediate predicted value in this embodiment is obtained by adjusting the error value calculated in the foregoing. The weighted average calculation formula adopted in this embodiment is:
TSDFgloble=∑i∈Mwi·TSDFi;
wherein, TSDFglobleFor voxel predictor, M is the number of intermediate predictors, TSDFiIs the ith intermediate predictor, wiIs TSDFiCorresponding weight value, wiExp (- α E). Wherein, the weighting parameter is equivalent to a naive average without weighting when α is equal to zero. As α approaches infinity, the predicted values that are most consistent with the observed geometry get higher weight. α in the present embodiment is set to 100.
Step S400, aiming at each voxel, when the voxel value corresponding to the voxel in the Nth original scene model is a non-voxel observation value, updating the Nth original scene model according to the voxel prediction value corresponding to the voxel to obtain an (N +1) th original scene model.
Specifically, after a voxel prediction value corresponding to each voxel is obtained, for each voxel, when a voxel value corresponding to the voxel in the nth original scene model is a non-voxel observation value, the nth original scene model is updated according to the voxel prediction value corresponding to the voxel, so as to obtain an (N +1) th original scene model. The non-body-side observation value corresponds to the body-side observation value, and the voxel observation value refers to a voxel value directly determined through the depth image, but not null, 1, -1 and other invalid values or voxel prediction values. That is, when the voxel value of a certain voxel is a voxel observed value, the voxel value of the voxel is not changed; and when the voxel value of a certain voxel is a non-voxel observation value, updating the Nth original scene model by taking the currently obtained voxel prediction value corresponding to the voxel as the voxel value corresponding to the voxel to obtain the (N +1) th original scene model.
Labeling each voxel with a label value, wherein the label value is labeled as P and is used for judging the type of the voxel value corresponding to the voxel.
And when the voxel value of the voxel is null, +1 or-1, the label value P corresponding to the voxel is assigned to be 0.
And when the voxel value of the voxel value is a numerical value determined according to the three-dimensional point cloud, assigning the label value P as 1.
When the voxel value of the voxel value is a value obtained by the foregoing prediction method, the label value P is assigned to-1.
In the updating process, a label value corresponding to each voxel in the Nth original scene model is obtained, when the label value is 0 or-1, a predicted voxel value obtained through prediction is used as a voxel value after the voxel is updated, and the label value is updated to be-1; when the label value is 1, the voxel value of the voxel is not changed; and when the voxel observation value exists in the voxel and the corresponding label value is-1 or 0, taking the voxel observation value as the voxel value corresponding to the voxel, and updating the label value to be 1. By the method, the predicted area in the scene three-dimensional model can be eliminated, the model is perfected by using the real geometric data of the scene, and the accurate estimation of the scene surface is realized. And simultaneously, camera attitude estimation is carried out according to the normal vector of each pixel point, and the reconstruction of a scene model is completed.
Subsequently, a vertex and a triangular patch can be extracted from the scene model obtained after the updating is finished through a Marching Cube algorithm and the like, so that a reconstructed model is obtained.
As shown in fig. 3, the left image is the three-dimensional model obtained without using the scene construction model of the present embodiment, and the right image is the three-dimensional model obtained using the scene construction model of the present embodiment, where the dashed-line frame selection part is a cavity region, it can be obviously found that by the scene model construction method of the present embodiment, the number and range of cavities can be greatly reduced, and the accuracy is high.
Further, as shown in fig. 4, based on the above scene model construction method, the present invention also provides an intelligent terminal, which includes a processor 10, a memory 20, and a display 30. Fig. 4 shows only some of the components of the smart terminal, but it should be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.
The memory 20 may be an internal storage unit of the intelligent terminal in some embodiments, such as a hard disk or a memory of the intelligent terminal. The memory 20 may also be an external storage device of the Smart terminal in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the Smart terminal. Further, the memory 20 may also include both an internal storage unit and an external storage device of the smart terminal. The memory 20 is used for storing application software installed in the intelligent terminal and various data, such as program codes of the installed intelligent terminal. The memory 20 may also be used to temporarily store data that has been output or is to be output. In one embodiment, the memory 20 stores a scene model building program 40, and the scene model building program 40 can be executed by the processor 10, so as to implement the scene model building method in the present application.
The processor 10 may be a Central Processing Unit (CPU), a microprocessor or other data Processing chip in some embodiments, and is configured to run program codes stored in the memory 20 or process data, for example, execute the scene model building method, and the like.
The display 30 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch panel, or the like in some embodiments. The display 30 is used for displaying information at the intelligent terminal and for displaying a visual user interface. The components 10-30 of the intelligent terminal communicate with each other via a system bus.
In one embodiment, when processor 10 executes scene model building program 40 in memory 20, the following steps are implemented:
when an Nth original depth image aiming at the same scene is obtained, carrying out scene fusion on an Nth original scene model according to the Nth original depth image to obtain an Nth intermediate scene model, wherein N is a natural number which is less than or equal to the total number of the original depth images, and when N is equal to 1, the first original scene model is a preset blank scene model;
extracting neighborhood characteristics of each voxel in the Nth intermediate scene model according to a preset extraction rule;
calculating a voxel prediction value corresponding to each voxel according to the neighborhood characteristics;
and for each voxel, when the voxel value corresponding to the voxel in the Nth original scene model is a non-voxel observation value, updating the Nth original scene model according to the voxel prediction value corresponding to the voxel to obtain an (N +1) th original scene model.
The present invention also provides a computer-readable storage medium, wherein the computer-readable storage medium stores a scene model construction program, which when executed by a processor implements the steps of the scene model construction method as described above.
Of course, it will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program instructing relevant hardware (such as a processor, a controller, etc.), and the program can be stored in a computer readable storage medium, and the program can include the processes of the method embodiments described above when executed. The computer readable storage medium may be a memory, a magnetic disk, an optical disk, etc.
It is to be understood that the invention is not limited to the above-described embodiments, and that modifications and variations may be made by persons skilled in the art in light of the above teachings, and all such modifications and variations are intended to fall within the scope of the invention as defined in the appended claims.
Claims (10)
1. A scene model construction method is characterized by comprising the following steps:
when an Nth original depth image aiming at the same scene is obtained, carrying out scene fusion on an Nth original scene model according to the Nth original depth image to obtain an Nth intermediate scene model, wherein N is a natural number which is less than or equal to the total number of the original depth images, and when N is equal to 1, the first original scene model is a preset blank scene model;
extracting neighborhood characteristics of each voxel in the Nth intermediate scene model according to a preset extraction rule;
calculating a voxel prediction value corresponding to each voxel according to the neighborhood characteristics;
and for each voxel, when the voxel value corresponding to the voxel in the Nth original scene model is a non-voxel observation value, updating the Nth original scene model according to the voxel prediction value corresponding to the voxel to obtain an (N +1) th original scene model.
2. The method for constructing a scene model according to claim 1, wherein when an nth original depth image for a same scene is obtained, scene fusion is performed on the nth original scene model according to the nth original depth image to obtain an nth intermediate scene model, specifically including:
when an Nth original depth image aiming at the same scene is obtained, filtering the Nth original depth image to generate an Nth noise reduction depth image;
calculating point clouds corresponding to all pixel points in the Nth noise reduction depth image according to camera internal parameters corresponding to the Nth original depth image to obtain a plurality of point clouds;
aiming at a pixel in each N-th noise reduction depth image, determining a normal vector of a point cloud corresponding to the pixel according to a neighborhood point cloud corresponding to the pixel, wherein the neighborhood point cloud corresponds to the neighborhood pixel of the pixel;
and carrying out scene fusion on each voxel in the Nth scene model according to the normal vector of each point cloud to obtain an Nth intermediate scene model.
3. The method according to claim 1, wherein for each of the voxels, the neighborhood feature corresponding to the voxel comprises a scene feature of a neighborhood voxel of the voxel, and the neighborhood voxel is a voxel corresponding to a neighborhood pixel of a pixel point corresponding to the voxel.
4. The method for constructing a scene model according to claim 1, wherein the extracting neighborhood characteristics of each voxel in the nth intermediate scene model according to a preset extraction rule specifically includes:
extracting scene features of each voxel in the Nth intermediate scene model according to a preset extraction rule;
aiming at each voxel, screening neighborhood voxels according to a preset screening rule and coordinates of the neighborhood voxels corresponding to the voxel to obtain a target voxel;
and taking the scene characteristic corresponding to the target voxel as the neighborhood characteristic corresponding to the voxel.
5. The method for constructing a scene model according to claim 1, wherein the calculating a voxel prediction value corresponding to each voxel according to the neighborhood characteristics specifically includes:
for each voxel, inputting the neighborhood characteristics corresponding to the voxel into a trained structured random forest model, and predicting the voxel value through the structured random forest model according to the input neighborhood characteristics to obtain a plurality of initial predicted values corresponding to the voxel;
calculating an error value corresponding to each initial predicted value according to a preset error loss function;
determining a number of intermediate predicted values of the initial predicted values according to the error value;
and calculating a voxel prediction value corresponding to the voxel according to the intermediate prediction value.
6. The method for constructing a scene model according to claim 5, wherein the training process of the structured random forest model comprises:
acquiring training depth images of different scenes;
according to a preset sampling rule, screening pixel points in the training depth image to obtain training pixel points;
aiming at each training pixel point, taking a voxel value corresponding to the pixel point as label data, and taking a neighborhood voxel value corresponding to the pixel point as training data;
inputting the training data into a preset structured random forest model, and calculating a corresponding training predicted value according to the training data through the structured random forest model;
and according to the training predicted value and the label data, carrying out parameter adjustment on the structured random forest model until the structured random forest model is converged.
7. The method of constructing a scene model according to claim 6, wherein the structured random forest model comprises a number of decision trees; the method for calculating the training prediction value comprises the following steps of inputting the training data into a preset structured random forest model, and calculating the corresponding training prediction value according to the training data through the structured random forest model, wherein the method specifically comprises the following steps:
generating a plurality of training subsets according to the label data and the training data, wherein the number of the training subsets is the same as that of the decision trees;
inputting training data in the training subset into each decision tree aiming at each decision tree, and performing dimensionality reduction and clustering on the training data by each node of the decision tree to obtain a main component value corresponding to the training data;
and determining child nodes corresponding to the training data according to the main component values until leaf nodes are reached, and obtaining a prediction training value corresponding to the decision number.
8. The method for constructing a scene model according to claim 5, wherein the calculating a voxel prediction value corresponding to the voxel according to the intermediate prediction value specifically includes:
aiming at each intermediate predicted value, determining a weight value corresponding to the intermediate predicted value according to an error value corresponding to the intermediate predicted value;
and calculating the voxel predicted value corresponding to the voxel according to the weight value corresponding to each intermediate predicted value.
9. An intelligent terminal, characterized in that, intelligent terminal includes: memory, a processor and a scene model builder stored on the memory and executable on the processor, the scene model builder when executed by the processor implementing the steps of the scene model building method as claimed in any one of claims 1 to 8.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a scene model construction program which, when executed by a processor, implements the steps of the scene model construction method according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110325406.7A CN113034675A (en) | 2021-03-26 | 2021-03-26 | Scene model construction method, intelligent terminal and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110325406.7A CN113034675A (en) | 2021-03-26 | 2021-03-26 | Scene model construction method, intelligent terminal and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113034675A true CN113034675A (en) | 2021-06-25 |
Family
ID=76474178
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110325406.7A Pending CN113034675A (en) | 2021-03-26 | 2021-03-26 | Scene model construction method, intelligent terminal and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113034675A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114638954A (en) * | 2022-02-22 | 2022-06-17 | 深圳元戎启行科技有限公司 | Point cloud segmentation model training method, point cloud data segmentation method and related device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104517289A (en) * | 2014-12-12 | 2015-04-15 | 浙江大学 | Indoor scene positioning method based on hybrid camera |
CN106803267A (en) * | 2017-01-10 | 2017-06-06 | 西安电子科技大学 | Indoor scene three-dimensional rebuilding method based on Kinect |
CN107292965A (en) * | 2017-08-03 | 2017-10-24 | 北京航空航天大学青岛研究院 | A kind of mutual occlusion processing method based on depth image data stream |
CN109215117A (en) * | 2018-09-12 | 2019-01-15 | 北京航空航天大学青岛研究院 | Flowers three-dimensional rebuilding method based on ORB and U-net |
CN110223383A (en) * | 2019-06-17 | 2019-09-10 | 重庆大学 | A kind of plant three-dimensional reconstruction method and system based on depth map repairing |
CN110827295A (en) * | 2019-10-31 | 2020-02-21 | 北京航空航天大学青岛研究院 | Three-dimensional semantic segmentation method based on coupling of voxel model and color information |
CN110874864A (en) * | 2019-10-25 | 2020-03-10 | 深圳奥比中光科技有限公司 | Method, device, electronic equipment and system for obtaining three-dimensional model of object |
-
2021
- 2021-03-26 CN CN202110325406.7A patent/CN113034675A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104517289A (en) * | 2014-12-12 | 2015-04-15 | 浙江大学 | Indoor scene positioning method based on hybrid camera |
CN106803267A (en) * | 2017-01-10 | 2017-06-06 | 西安电子科技大学 | Indoor scene three-dimensional rebuilding method based on Kinect |
CN107292965A (en) * | 2017-08-03 | 2017-10-24 | 北京航空航天大学青岛研究院 | A kind of mutual occlusion processing method based on depth image data stream |
CN109215117A (en) * | 2018-09-12 | 2019-01-15 | 北京航空航天大学青岛研究院 | Flowers three-dimensional rebuilding method based on ORB and U-net |
CN110223383A (en) * | 2019-06-17 | 2019-09-10 | 重庆大学 | A kind of plant three-dimensional reconstruction method and system based on depth map repairing |
CN110874864A (en) * | 2019-10-25 | 2020-03-10 | 深圳奥比中光科技有限公司 | Method, device, electronic equipment and system for obtaining three-dimensional model of object |
CN110827295A (en) * | 2019-10-31 | 2020-02-21 | 北京航空航天大学青岛研究院 | Three-dimensional semantic segmentation method based on coupling of voxel model and color information |
Non-Patent Citations (2)
Title |
---|
强孙源等: "基于二值随机森林的非均匀光照QR码重构算法", 包装工程, vol. 40, no. 11, 30 June 2019 (2019-06-30), pages 232 - 237 * |
闫利;陈长海;费亮;张奕戈;: "密集点云的数字表面模型自动生成方法", 遥感信息, no. 05, 15 October 2017 (2017-10-15), pages 1 - 7 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114638954A (en) * | 2022-02-22 | 2022-06-17 | 深圳元戎启行科技有限公司 | Point cloud segmentation model training method, point cloud data segmentation method and related device |
CN114638954B (en) * | 2022-02-22 | 2024-04-19 | 深圳元戎启行科技有限公司 | Training method of point cloud segmentation model, point cloud data segmentation method and related device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Guerry et al. | Snapnet-r: Consistent 3d multi-view semantic labeling for robotics | |
US11494915B2 (en) | Image processing system, image processing method, and program | |
US9417700B2 (en) | Gesture recognition systems and related methods | |
CN110084304B (en) | Target detection method based on synthetic data set | |
US8610712B2 (en) | Object selection in stereo image pairs | |
EP2080167B1 (en) | System and method for recovering three-dimensional particle systems from two-dimensional images | |
CN110945565A (en) | Dense visual SLAM using probabilistic bin maps | |
CN112991413A (en) | Self-supervision depth estimation method and system | |
EP3326156B1 (en) | Consistent tessellation via topology-aware surface tracking | |
JP2006520055A (en) | Invariant viewpoint detection and identification of 3D objects from 2D images | |
EP3756163B1 (en) | Methods, devices, and computer program products for gradient based depth reconstructions with robust statistics | |
US20220415030A1 (en) | AR-Assisted Synthetic Data Generation for Training Machine Learning Models | |
CN112465021B (en) | Pose track estimation method based on image frame interpolation method | |
CN112767478B (en) | Appearance guidance-based six-degree-of-freedom pose estimation method | |
KR102223484B1 (en) | System and method for 3D model generation of cut slopes without vegetation | |
CN114581571A (en) | Monocular human body reconstruction method and device based on IMU and forward deformation field | |
US11138812B1 (en) | Image processing for updating a model of an environment | |
CN114170290A (en) | Image processing method and related equipment | |
CN113436251B (en) | Pose estimation system and method based on improved YOLO6D algorithm | |
CN113034675A (en) | Scene model construction method, intelligent terminal and computer readable storage medium | |
KR20230083212A (en) | Apparatus and method for estimating object posture | |
CN114241013B (en) | Object anchoring method, anchoring system and storage medium | |
CN115984583B (en) | Data processing method, apparatus, computer device, storage medium, and program product | |
Shao et al. | Appearance-based tracking and recognition using the 3D trilinear tensor | |
CN117392721A (en) | Face key point detection method based on multi-scale feature and offset prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |