CN107622244B - Indoor scene fine analysis method based on depth map - Google Patents
Indoor scene fine analysis method based on depth map Download PDFInfo
- Publication number
- CN107622244B CN107622244B CN201710874793.3A CN201710874793A CN107622244B CN 107622244 B CN107622244 B CN 107622244B CN 201710874793 A CN201710874793 A CN 201710874793A CN 107622244 B CN107622244 B CN 107622244B
- Authority
- CN
- China
- Prior art keywords
- depth map
- indoor scene
- analyzed
- value
- category
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Abstract
The invention discloses a depth map-based indoor scene fine analysis method, which is applied to the technical field of digital image processing and pattern recognition, and comprises the following steps: extracting three-channel characteristics of the depth map, and segmenting a target in the depth map of the indoor scene to be analyzed by utilizing the trained full convolution network; on the depth feature map, utilizing a fully connected conditional random field to perfect and optimize the boundary of the segmentation result to obtain category label vectors of all pixels in the indoor scene depth map to be analyzed; and converting the indoor scene depth map to be analyzed into point cloud, analyzing the three-dimensional structure of the target based on the category label vector, and obtaining the space posture of the target. According to the invention, only the depth map is used as input, the semantic segmentation of the indoor scene is realized, the spatial posture of a specific object under a three-dimensional coordinate is given, the shielding can be effectively overcome, the foreground and the background are separated, and the privacy of a user is protected.
Description
Technical Field
The invention belongs to the technical field of digital image processing and pattern recognition, and particularly relates to a depth map-based indoor scene fine analysis method.
Background
The indoor scene analysis is a task integrating target detection and image segmentation technologies, requires a computer to understand an image in multiple levels, and comprises 2D to 3D omnibearing and multi-angle algorithm design from bottom-layer object positioning, identification and segmentation to upper-layer scene identification and indoor object layout analysis.
Traditional scene parsing is mainly based on color images, and relies on limited information sources, mainly color, texture, and the like. The existing algorithm adopts a bottom-up frame to classify the image superpixels, and then optimizes the segmentation result by using a graph model. However, the existing algorithm has two defects: firstly, under the conditions of serious indoor shielding and complex objects, the robustness is poor, and the target and the background are difficult to distinguish; secondly, the planar color image has the defect of insufficient innate information source, and the position information of the target three-dimensional space cannot be provided.
In recent years, the popularization of depth cameras provides a new dimension for solving the problems, so that the level of analysis and understanding of indoor scenes is greatly improved. The depth images provide a visual angle closer to the real world, the difference between the foreground and the background can be reflected through the distance, meanwhile, the surface geometric information is added on the basis of the visual information, and the unique characteristics of the depth images provide great convenience for the 3D analysis of the indoor scene.
The existing indoor scene analysis technology based on the depth map is similar to the traditional color image method in thinking, only uses the depth information as a new feature, and does not fully utilize the unique characteristics of the depth map. It is worth mentioning that, in practical applications, both the traditional method based on color images alone and the method that relies on depth images and also relies on color images inevitably fails in the case of night light turn-off. Furthermore, with color cameras, there is a risk of revealing the privacy of the user.
Disclosure of Invention
Aiming at the defects or improvement requirements in the prior art, the invention provides a depth map-based indoor scene fine analysis method, so that the technical problem that an indoor scene cannot be identified under the condition of no illumination due to a color image in the existing depth map-based indoor scene analysis technology is solved.
In order to achieve the above object, according to an aspect of the present invention, there is provided a depth map-based indoor scene refinement analysis method, including:
(1) extracting a three-channel characteristic diagram of an indoor scene depth map to be analyzed, taking the extracted three-channel characteristic diagram as the input of a trained full convolution network, and segmenting a target in the indoor scene depth map to be analyzed;
(2) according to the extracted three-channel characteristic diagram, utilizing a full-connection conditional random field to perfect and optimize the boundary of the segmentation result to obtain category label vectors of all pixels in the indoor scene depth map to be analyzed;
(3) and converting the indoor scene depth map to be analyzed into point cloud, analyzing the three-dimensional structure of the target based on the category label vector, and obtaining the space posture of the target.
Preferably, step (1) specifically comprises:
(1.1) coding the indoor scene depth map I to be analyzed into a three-channel map IEThe pixels of each channel image correspond to the pixels in the indoor scene depth map I to be analyzed one by one, and the three channels respectively represent the parallax value, the height from the ground and the size of an included angle between a normal vector and the gravity direction;
(1.2) three-channel diagram IEExtracting multi-level CNN characteristics layer by layer as input of a trained full convolution network, wherein a convolution characteristic graph obtained in the previous layer is sent to the next layer to extract a new convolution characteristic graph after downsampling;
(1.3) respectively passing the convolution characteristic graphs in different layers through deconvolution layers, upsampling to the same size, then mutually fusing the characteristic graphs in different layers, and sending the fused characteristic graphs into a softmax layer;
(1.4) predicting the category of each pixel point through a softmax layer, outputting the probability that each pixel point belongs to each category, wherein the category corresponding to the maximum probability value is the initial category label of the pixel point.
Preferably, step (1.1) specifically comprises:
(1.1.1) preparation ofObtaining the relation between the parallax d and the depth value Z corresponding to the pixel point;
(1.1.2) preparation ofObtaining the normal vector of each pixel point, wherein norm [ ·]Representing the normalization of the vector, the notation × represents the vector outer product,representing the pixel locations of the two-dimensional plane of the depth map of the indoor scene to be resolved,the method comprises the following steps of representing coordinates in a three-dimensional space of an indoor scene depth map to be analyzed, wherein the conversion relation between the two-dimensional coordinates and the three-dimensional coordinates is as follows:is an internal reference matrix of the depth camera;
(1.1.3) preparation ofAnd constructing a parallel set N∣∣And a vertical set N⊥Wherein, in the step (A),the normal vector representing the pixel point is then,which represents the direction of the force of gravity,the included angle between the normal vector and the gravity direction is defined, and rho represents an angle error allowance;
(1.1.4) matrix N to be solved⊥N⊥ T-N∣∣N∣∣ TTaking the characteristic value as an updated gravity vector, continuously executing the step (1.1.3) by adopting the updated gravity vector until the characteristic value is stable and unchanged to obtain a target gravity vector, and calculating an included angle between a normal vector of each pixel in the point cloud and the target gravity direction, wherein the point cloud represents that coordinates (x, y, z) in a three-dimensional space corresponding to all pixel points form a three-dimensional point cloud;
(1.1.5) calculating the projection value of each point along the target gravity vector by taking the target gravity vector as a reference axis, finding the lowest point, and taking the difference value between the projection value of other points along the target gravity vector and the lowest point as the height from the ground.
Preferably, step (1.4) specifically comprises:
predicting the category of each pixel point through the softmax layer, and outputting the probability that each pixel point i belongs to each categoryWherein l {1,2, …, C } represents a category label and the maximum probability value is assigned1,2, …, the category corresponding to C is used as the initial category label of the pixel,the output of the last layer of the full convolution network without considering the softmax layer.
Preferably, the step (2) specifically comprises:
(2.1) by conditional probabilityDefining a conditional random field distribution, wherein X is defined by X1,X2,...,XNConstituent random vectors,Xi(i-1, 2, …, N) indicates the initial category label to which the ith pixel belongs, and z (i) - ∑Xexp (-E (X | I)) represents the sum of the exp (·) terms corresponding to X for all possible cases, E (X | I) represents the total energy function of the conditional random field;
(2.2) preparation ofObtaining a total energy function in which the univariate termBinary term Wherein p isiRepresenting the position, p, of a pixel point ijRepresenting the position of pixel j, over-parameter σα,σβAnd σγRepresenting weights governing the Gaussian kernel for specifying a range of adjacent pixels having an effect on a specified pixel, w1And w2Respectively representing the weight occupied by the Gaussian kernel function in two different feature spaces, P (x)i) Representing the maximum probability value, I, of the target class label corresponding to the pixel point IE iRepresents a three-channel diagram IEThe value of the ith pixel point, IE jRepresents a three-channel diagram IEThe value of the jth pixel point, xiIndicating the possible label value, x, of pixel ijRepresenting the possible label value of the pixel point j;
and (2.3) solving so that the value of X corresponding to the maximum conditional probability P (X ═ X | I) is the optimized segmentation result of the indoor scene depth map I to be analyzed, and obtaining target class label vectors of all pixels in the indoor scene depth map I to be analyzed.
Preferably, the method further comprises:
byObtaining an error function of the full convolution network, wherein z represents the output of the last layer of the full convolution network, N represents the total number of pixels in the depth map, and y representsi∈ {1,2, …, C } represents the true category of the manual label corresponding to pixel point i, C represents the total number of categories,the depth map representing the jth input corresponds to the category y at the last layer of the full convolution networkiIs then outputted from the output of (a),the depth map representing the ith input corresponds to the category y at the last layer of the full convolution networkiAn output of (d);
the method comprises the steps of training by utilizing a neural network framework Caffe, initializing full convolution network parameters, updating the full convolution network parameters by using a back propagation algorithm, stopping training when an error function value is not changed any more, and obtaining a trained full convolution network, wherein in the training process of the full convolution network, results obtained by shallow neural network layers are fused and output.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects: the invention only adopts the depth map as input, and the depth map is not influenced by illumination conditions and can reflect the space geometric characteristics of a complex indoor environment, so that scene segmentation and understanding are carried out on the basis, the shielding can be effectively overcome, the foreground and the background can be separated, and the space posture of the target object under the three-dimensional coordinate can be given.
Drawings
Fig. 1 is a schematic flowchart of an indoor scene refinement analysis method based on a depth map according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of another depth map-based indoor scene refinement analysis method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of estimating object spatial position information based on 2D segmentation results according to the present invention, in which (1) representsThe projection of the pixel of the depth map marked as an object on the xy plane in space, (2) shows the result after filtering noise points through morphological operation, (3) shows the found 4 corner points, ViI is 1,2,3,4, and (4) represents a three-dimensional bounding box drawn after estimating the height of the object space;
fig. 4 is a scene analysis diagram of the experiment in the bedroom and the hospital ward, wherein the first action is inputting a depth map, and the second action is corresponding to a refined analysis result.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The invention provides refined indoor scene analysis only by using a depth map. By means of strong understanding and generalization capability of the deep convolutional neural network, information such as edges and shapes in the depth map can be automatically learned, and segmentation and identification results of pixel levels of indoor main objects are given. On the basis, the position of the object in the 2D image plane is given, and the position and posture information of the target in the three-dimensional space can be analyzed by combining the traditional optimization method. Due to the characteristics of good robustness, capability of protecting user privacy and the like, the technology can provide powerful help for behavior analysis and intelligent nursing of the old. The daily activities of the elderly are closely related to large furniture such as beds, chairs, tables, sofas and the like in rooms and indoor structures such as floors, walls and the like, for example, the falling bed of the elderly is related to the position detection of the bed surface and the ground. In the future, the research and implementation of the household robot for serving the old people also depend on the detailed cognition of the computer on the indoor scene.
In order to realize the purpose, the invention is mainly divided into three steps: first, scene parsing. Firstly, training a full convolution network for depth image analysis aiming at an indoor scene database, and during testing, segmenting an input depth image of a new scene by using the trained full convolution network to obtain an initial analysis result. Secondly, optimizing the analysis result. And calculating an energy function aiming at the whole graph by using the full-connection conditional random field to obtain an optimized segmentation result. Thirdly, on the basis of the analysis result, the depth map is converted into a three-dimensional point cloud, and the position and the posture of the target in the three-dimensional coordinate are estimated.
Fig. 1 is a schematic flowchart illustrating an indoor scene refinement analysis method based on a depth map according to an embodiment of the present invention; in the method shown in fig. 1, the following steps are included:
(1) extracting a three-channel characteristic diagram of the indoor scene depth diagram to be analyzed, taking the extracted three-channel characteristic diagram as the input of a trained full convolution network, and segmenting a target in the indoor scene depth diagram to be analyzed;
in an optional embodiment, the method further comprises the step of training the full convolutional network:
byObtaining an error function of the full convolution network, wherein z represents the output of the last layer of the full convolution network, N represents the total number of pixels in the depth map, and y representsi∈ {1,2, …, C } represents the true category of the manual label corresponding to pixel point i, C represents the total number of categories,the depth map representing the jth input corresponds to the category y at the last layer of the full convolution networkiIs then outputted from the output of (a),the depth map representing the ith input corresponds to the category y at the last layer of the full convolution networkiAn output of (d);
the method comprises the steps of training by utilizing a neural network framework Caffe, initializing parameters of a full convolution network, updating the parameters of the full convolution network by using a back propagation algorithm, stopping training when an error function value is not changed any more, and obtaining the trained full convolution network, wherein in order to obtain a more refined segmentation result, in the training process of the full convolution network, results obtained by a shallow neural network layer are fused and output.
As an optional implementation manner, as shown in fig. 2, which is a flowchart of an indoor scene depth image analysis method for intelligent nursing according to an embodiment of the present invention, when training a full convolution network of an indoor scene segmentation task, in order to train a full convolution network with sufficient generalization capability, an input sample image may be made into a training data set specifically for a ward scene on the basis of an existing NYUD2 indoor scene database, and the training data set includes 100 depth pictures and is mainly labeled for a bed, a ground, a wall, and other large-scale indoor targets.
In the embodiment of the present invention, a VGG16 network model trained on an ImageNet data set may be used, the number of network layers may be increased or decreased according to actual needs, or other network structures, such as AlexNet, ResNet, and the like, may be used to initialize neural network parameters in a target detection method. The embodiment of the present invention is not limited uniquely, which network model is specifically adopted.
In an optional embodiment, step (1) specifically includes:
(1.1) coding the indoor scene depth map I to be analyzed into a three-channel map IEThe pixels of each channel image correspond to the pixels in the indoor scene depth map I to be analyzed one by one, and the three channels respectively represent the parallax value, the height from the ground and the size of an included angle between a normal vector and the gravity direction;
wherein, the step (1.1) specifically comprises the following steps:
(1.1.1) calculating a disparity value: byObtaining the relation between the parallax d and the depth value Z corresponding to the pixel point;
(1.1.2) calculating the size of an included angle between a normal vector and the gravity direction: deducing the pixel position of the two-dimensional plane of the indoor scene depth map to be analyzedAnd coordinates in three-dimensional space of indoor scene depth map to be resolvedThe following equation is satisfied:
whereinThe conversion relation between the two-dimensional coordinates and the three-dimensional coordinates is as follows:
coordinates (x, y, z) in the three-dimensional space corresponding to all the pixel points form a three-dimensional point cloud, and a normal vector calculation formula corresponding to each pixel point is as follows
Wherein norm [. cndot ] represents the normalization of the vector, and symbol x represents the vector outer product;
(1.1.3) for all pixels in the point cloud Andconstructing a parallel set N∣∣And a vertical set N⊥Wherein, in the step (A),the normal vector representing the pixel point is then,indicating the direction of gravity, the initial value may be ρ represents the angular error margin, which is the angle between the normal vector and the direction of gravity, preferably ρ is 5 °;
(1.1.4) matrix N to be solved⊥N⊥ T-N∣∣N∣∣ TTaking the characteristic value as an updated gravity vector, continuously executing the step (1.1.3) by adopting the updated gravity vector until the characteristic value is stable and unchanged to obtain a target gravity vector, and calculating an included angle between a normal vector of each pixel in the point cloud and the target gravity direction, wherein the point cloud represents that coordinates (x, y, z) in a three-dimensional space corresponding to all pixel points form a three-dimensional point cloud;
(1.1.5) calculating the height from the ground: and taking the target gravity vector as a reference axis, solving the projection value of each point along the target gravity vector, finding the lowest point, and taking the difference value between the projection value of other points along the target gravity vector and the lowest point as the height from the ground.
(1.2) three-channel diagram IEExtracting characteristics of a multi-level Convolutional Neural Network (CNN) layer by layer as input of a trained full Convolutional Network, wherein a Convolutional characteristic diagram obtained in the previous layer is sent to the next layer to extract a new Convolutional characteristic diagram after being downsampled;
in specific implementation, the structure of the full convolution network and the parameters of each layer of convolution kernel adopted in the embodiment of the present invention are shown in fig. 2.
(1.3) respectively passing the convolution characteristic graphs in different layers through deconvolution layers, upsampling to the same size, then mutually fusing the characteristic graphs in different layers, and sending the fused characteristic graphs into a softmax layer;
specifically, taking fig. 2 as an example, the effect of the deconvolution layer and the convolution layer are exactly opposite, and the two operate in reverse. The characteristic diagram of the pool5 layer is up-sampled to 2 times of the original size through deconvolution, namely, the characteristic diagram has the same size as the pool4 layer, is up-sampled to 2 times of the size through deconvolution after being overlapped, namely, the characteristic diagram has the same size as the pool3 layer, and a final up-sampling result can be obtained after being overlapped and used as the input of the next softmax layer.
(1.4) predicting the category of each pixel point through a softmax layer, outputting the probability that each pixel point belongs to each category, wherein the category corresponding to the maximum probability value is the initial category label of the pixel point.
Wherein, the step (1.4) specifically comprises the following steps:
predicting the category of each pixel point through the softmax layer, and outputting the probability that each pixel point i belongs to each categoryWherein l {1,2, …, C } represents a category label and the maximum probability value is assigned1,2, …, the category corresponding to C is used as the initial category label of the pixel,andthe output of the last layer of the full convolution network without considering the softmax layer.
(2) According to the extracted three-channel characteristic diagram, utilizing a full-connection conditional random field to perfect and optimize the boundary of the segmentation result to obtain category label vectors of all pixels in the indoor scene depth map to be analyzed;
in an optional embodiment, step (2) specifically includes:
(2.1) by conditional probabilityDefining a conditional random field distribution, wherein X is defined by X1,X2,...,XNRandom vector of composition, Xi(i-1, 2, …, N) indicates the initial category label to which the ith pixel belongs, and z (i) - ∑Xexp (-E (X | I)) represents the sum of the exp (·) terms corresponding to X for all possible cases, E (X | I) represents the total energy function of the conditional random field;
(2.2) preparation ofObtaining a total energy function in which the univariate termBinary term Wherein p isiRepresenting the position, p, of a pixel point ijRepresenting the position of pixel j, over-parameter σα,σβAnd σγRepresenting weights governing the Gaussian kernel for specifying a range of adjacent pixels having an effect on a specified pixel, w1And w2Respectively representing the weight occupied by the Gaussian kernel function in two different feature spaces, P (x)i) Representing the maximum probability value, I, of the target class label corresponding to the pixel point IE iRepresents a three-channel diagram IEThe value of the ith pixel point, IE jRepresents a three-channel diagram IEThe value of the jth pixel point, xiIndicating the possible label value, x, of pixel ijRepresenting the possible label value of the pixel point j;
in the implementation of the present invention, a cross-validation method can be used to determine the optimal value combination of the above parameters. First default setting w2And σγTo 3, then randomly select 100 samples from the validation data set, find w1,σαAnd σβWith the search range set to w1∈(0,20),σα∈(0,100),σβ∈ (0,20) from the above experiment, w was found1,σαAnd σβThe optimum value of (c).
And (2.3) solving so that the value of X corresponding to the maximum conditional probability P (X ═ X | I) is the optimized segmentation result of the indoor scene depth map I to be analyzed, and obtaining target class label vectors of all pixels in the indoor scene depth map I to be analyzed.
By utilizing the current most efficient high-order filtering algorithm to carry out approximate inference on the probability distribution of the model, the speed of optimization solution can be obviously improved.
(3) And converting the indoor scene depth map to be analyzed into point cloud, analyzing the three-dimensional structure of the target based on the category label vector, and obtaining the space posture of the target.
In the embodiment of the present invention, as shown in fig. 3, taking spatial orientation analysis of a couch top as an example, other types of targets can be simply modified according to practical applications, and the principle is not changed. In fig. 3, (1) represents the projection of the pixel of the depth map marked as an object on the xy plane in space, (2) represents the result of filtering out the noise points by morphological operation, (3) represents the found 4 corner points, ViAnd i is 1,2,3,4, and (4) represents a three-dimensional bounding box drawn after estimating the height of the object space. The step 3 specifically comprises the following substeps:
and (3.1) projecting the segmentation result on the two-dimensional image plane of the indoor scene depth map to be analyzed to the coordinates of the three-dimensional point cloud. The calculation method of the projection is completely the same as the step (1.1.2);
(3.2) projecting the three-dimensional coordinates of the pixel points with the bed surface labels to an xy plane, performing morphological operation (for example, performing morphological corrosion operation and then performing morphological expansion operation) and filtering noise points;
(3.3) finding the points with the maximum and minimum coordinates in the x-direction and the y-direction, and marking as ViI is 1,2,3,4, represents 4 corner points of the bed surface, and is sequentially connected with ViForm a closed geometric figure, and the normal vectors of all points inside the closed figure represent the orientation of the plane in space and can be expressed inAttitude and structural information of the target in three-dimensional space;
(3.4) calculating the distance between the upper plane and the ground plane as the height h of the space occupied by the object, using ViAnd h drawing a cubic frame of the stereo attitude estimation in a three-dimensional space coordinate system. Fig. 4 is a view showing a scene analysis chart of the experiment in the bedroom and the hospital ward, wherein the first action is inputting a depth map, and the second action is corresponding to a fine analysis result.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (5)
1. A depth map-based indoor scene refinement analysis method is characterized by comprising the following steps:
(1) extracting a three-channel characteristic diagram of an indoor scene depth map to be analyzed, taking the extracted three-channel characteristic diagram as the input of a trained full convolution network, and segmenting a target in the indoor scene depth map to be analyzed;
(2) according to the extracted three-channel characteristic diagram, utilizing a full-connection conditional random field to perfect and optimize the boundary of the segmentation result to obtain category label vectors of all pixels in the indoor scene depth map to be analyzed;
in the three-channel characteristic diagram, pixels of each channel image correspond to pixels in the indoor scene depth diagram to be analyzed one by one, and the three channels respectively represent parallax values, ground height and the size of an included angle between a normal vector and the gravity direction;
the method for extracting the three-channel characteristic diagram of the indoor scene depth map to be analyzed comprises the following steps:
(1.1.1) preparation ofObtaining the relation between the parallax d and the depth value Z corresponding to the pixel point;
(1.1.2) preparation ofObtaining the normal vector of each pixel point, wherein norm [ ·]Representing the normalization of the vector, the notation × represents the vector outer product,representing the pixel locations of the two-dimensional plane of the depth map of the indoor scene to be resolved,the method comprises the following steps of representing coordinates in a three-dimensional space of an indoor scene depth map to be analyzed, wherein the conversion relation between the two-dimensional coordinates and the three-dimensional coordinates is as follows:is an internal reference matrix of the depth camera;
(1.1.3) preparation ofAnd constructing a parallel set NIIAnd a vertical set N⊥Wherein, in the step (A),the normal vector representing the pixel point is then,which represents the direction of the force of gravity,the included angle between the normal vector and the gravity direction is defined, and rho represents an angle error allowance;
(1.1.4) matrix N to be solved⊥N⊥ T-NIINII TTaking the characteristic value as an updated gravity vector, continuously executing the step (1.1.3) by adopting the updated gravity vector until the characteristic value is stable and unchanged to obtain a target gravity vector, and calculating an included angle between a normal vector of each pixel in the point cloud and the target gravity direction, wherein the point cloud represents that coordinates (x, y, z) in a three-dimensional space corresponding to all pixel points form a three-dimensional point cloud;
(1.1.5) calculating a projection value of each point along the target gravity vector by taking the target gravity vector as a reference axis, finding the lowest point, and taking the difference value between the projection value of other points along the target gravity vector and the lowest point as the height from the ground;
(3) and converting the indoor scene depth map to be analyzed into point cloud, analyzing the three-dimensional structure of the target based on the category label vector, and obtaining the space posture of the target.
2. The method according to claim 1, wherein step (1) comprises in particular:
(1.1) coding the indoor scene depth map I to be analyzed into a three-channel map IE;
(1.2) three-channel diagram IEExtracting multi-level CNN characteristics layer by layer as input of a trained full convolution network, wherein a convolution characteristic graph obtained in the previous layer is sent to the next layer to extract a new convolution characteristic graph after downsampling;
(1.3) respectively passing the convolution characteristic graphs in different layers through deconvolution layers, upsampling to the same size, then mutually fusing the characteristic graphs in different layers, and sending the fused characteristic graphs into a softmax layer;
(1.4) predicting the category of each pixel point through a softmax layer, outputting the probability that each pixel point belongs to each category, wherein the category corresponding to the maximum probability value is the initial category label of the pixel point.
3. The method according to claim 2, characterized in that step (1.4) comprises in particular:
by softmaThe x layer predicts the category of each pixel point and outputs the probability that each pixel point i belongs to each categoryWherein l ═ {1, 2.. multidata, C } represents the class label and the maximum probability value is givenThe corresponding category is used as the initial category label of the pixel point,the output of the last layer of the full convolution network without considering the softmax layer.
4. The method according to claim 2 or 3, characterized in that step (2) comprises in particular:
(2.1) by conditional probabilityDefining a conditional random field distribution, wherein X is defined by X1,X2,...,XNRandom vector of composition, XiN denotes an initial class label to which the ith pixel belongs, and z (i) ∑Xexp (-E (X | I)) represents the sum of the exp (·) terms corresponding to X for all possible cases, E (X | I) represents the total energy function of the conditional random field;
(2.2) preparation ofObtaining a total energy function in which the univariate termBinary term Wherein p isiRepresenting the position, p, of a pixel point ijRepresenting the position of pixel j, over-parameter σα,σβAnd σγRepresenting weights governing the Gaussian kernel for specifying a range of adjacent pixels having an effect on a specified pixel, w1And w2Respectively representing the weight occupied by the Gaussian kernel function in two different feature spaces, P (x)i) Representing the maximum probability value, I, of the target class label corresponding to the pixel point IE iRepresents a three-channel diagram IEThe value of the ith pixel point, IE jRepresents a three-channel diagram IEThe value of the jth pixel point, xiIndicating the possible label value, x, of pixel ijRepresenting the possible label value of the pixel point j;
and (2.3) solving so that the value of X corresponding to the maximum conditional probability P (X ═ X | I) is the optimized segmentation result of the indoor scene depth map I to be analyzed, and obtaining target class label vectors of all pixels in the indoor scene depth map I to be analyzed.
5. The method of claim 1, further comprising:
byObtaining an error function of the full convolution network, wherein z represents the output of the last layer of the full convolution network, N represents the total number of pixels in the depth map, and y representsi∈ {1, 2., C } represents the actual category of the manual label corresponding to pixel point i, C represents the total number of categories,the depth map representing the jth input corresponds to the category y at the last layer of the full convolution networkiIs then outputted from the output of (a),depth map representing ith input in full convolution netCorresponding category y of the last layer of the networkiAn output of (d);
the method comprises the steps of training by utilizing a neural network framework Caffe, initializing full convolution network parameters, updating the full convolution network parameters by using a back propagation algorithm, stopping training when an error function value is not changed any more, and obtaining a trained full convolution network, wherein in the training process of the full convolution network, results obtained by shallow neural network layers are fused and output.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710874793.3A CN107622244B (en) | 2017-09-25 | 2017-09-25 | Indoor scene fine analysis method based on depth map |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710874793.3A CN107622244B (en) | 2017-09-25 | 2017-09-25 | Indoor scene fine analysis method based on depth map |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107622244A CN107622244A (en) | 2018-01-23 |
CN107622244B true CN107622244B (en) | 2020-08-28 |
Family
ID=61090539
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710874793.3A Active CN107622244B (en) | 2017-09-25 | 2017-09-25 | Indoor scene fine analysis method based on depth map |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107622244B (en) |
Families Citing this family (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108596102B (en) * | 2018-04-26 | 2022-04-05 | 北京航空航天大学青岛研究院 | RGB-D-based indoor scene object segmentation classifier construction method |
CN109034183B (en) * | 2018-06-07 | 2021-05-18 | 苏州飞搜科技有限公司 | Target detection method, device and equipment |
CN109118490B (en) * | 2018-06-28 | 2021-02-26 | 厦门美图之家科技有限公司 | Image segmentation network generation method and image segmentation method |
CN110378359B (en) * | 2018-07-06 | 2021-11-05 | 北京京东尚科信息技术有限公司 | Image identification method and device |
CN109064455B (en) * | 2018-07-18 | 2021-06-25 | 清华大学深圳研究生院 | BI-RADS-based classification method for breast ultrasound image multi-scale fusion |
CN110827337B (en) * | 2018-08-08 | 2023-01-24 | 深圳地平线机器人科技有限公司 | Method and device for determining posture of vehicle-mounted camera and electronic equipment |
CN110160502B (en) | 2018-10-12 | 2022-04-01 | 腾讯科技(深圳)有限公司 | Map element extraction method, device and server |
CN109452914A (en) * | 2018-11-01 | 2019-03-12 | 北京石头世纪科技有限公司 | Intelligent cleaning equipment, cleaning mode selection method, computer storage medium |
CN109409376B (en) * | 2018-11-05 | 2020-10-30 | 昆山紫东智能科技有限公司 | Image segmentation method for solid waste object, computer terminal and storage medium |
CN109635685B (en) * | 2018-11-29 | 2021-02-12 | 北京市商汤科技开发有限公司 | Target object 3D detection method, device, medium and equipment |
CN109658449B (en) * | 2018-12-03 | 2020-07-10 | 华中科技大学 | Indoor scene three-dimensional reconstruction method based on RGB-D image |
CN110046747B (en) * | 2019-03-19 | 2021-07-27 | 华中科技大学 | Method and system for planning paths among users of social network facing to image flow |
CN109917419B (en) * | 2019-04-12 | 2021-04-13 | 中山大学 | Depth filling dense system and method based on laser radar and image |
CN110047047B (en) * | 2019-04-17 | 2023-02-10 | 广东工业大学 | Method for interpreting three-dimensional morphology image information device, apparatus and storage medium |
CN110222767B (en) * | 2019-06-08 | 2021-04-06 | 西安电子科技大学 | Three-dimensional point cloud classification method based on nested neural network and grid map |
CN110569709A (en) * | 2019-07-16 | 2019-12-13 | 浙江大学 | Scene analysis method based on knowledge reorganization |
CN111325135B (en) * | 2020-02-17 | 2022-11-29 | 天津中科智能识别产业技术研究院有限公司 | Novel online real-time pedestrian tracking method based on deep learning feature template matching |
CN111507266A (en) * | 2020-04-17 | 2020-08-07 | 四川长虹电器股份有限公司 | Human body detection method and device based on depth image |
CN112818756A (en) * | 2021-01-13 | 2021-05-18 | 上海西井信息科技有限公司 | Target detection method, system, device and storage medium |
CN113052971B (en) * | 2021-04-09 | 2022-06-10 | 杭州群核信息技术有限公司 | Neural network-based automatic layout design method, device and system for indoor lamps and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105979244A (en) * | 2016-05-31 | 2016-09-28 | 十二维度(北京)科技有限公司 | Method and system used for converting 2D image to 3D image based on deep learning |
CN106296728A (en) * | 2016-07-27 | 2017-01-04 | 昆明理工大学 | A kind of Segmentation of Moving Object method in unrestricted scene based on full convolutional network |
CN106600571A (en) * | 2016-11-07 | 2017-04-26 | 中国科学院自动化研究所 | Brain tumor automatic segmentation method through fusion of full convolutional neural network and conditional random field |
CN106815563A (en) * | 2016-12-27 | 2017-06-09 | 浙江大学 | A kind of crowd's quantitative forecasting technique based on human body apparent structure |
CN106934765A (en) * | 2017-03-14 | 2017-07-07 | 长沙全度影像科技有限公司 | Panoramic picture fusion method based on depth convolutional neural networks Yu depth information |
-
2017
- 2017-09-25 CN CN201710874793.3A patent/CN107622244B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105979244A (en) * | 2016-05-31 | 2016-09-28 | 十二维度(北京)科技有限公司 | Method and system used for converting 2D image to 3D image based on deep learning |
CN106296728A (en) * | 2016-07-27 | 2017-01-04 | 昆明理工大学 | A kind of Segmentation of Moving Object method in unrestricted scene based on full convolutional network |
CN106600571A (en) * | 2016-11-07 | 2017-04-26 | 中国科学院自动化研究所 | Brain tumor automatic segmentation method through fusion of full convolutional neural network and conditional random field |
CN106815563A (en) * | 2016-12-27 | 2017-06-09 | 浙江大学 | A kind of crowd's quantitative forecasting technique based on human body apparent structure |
CN106934765A (en) * | 2017-03-14 | 2017-07-07 | 长沙全度影像科技有限公司 | Panoramic picture fusion method based on depth convolutional neural networks Yu depth information |
Also Published As
Publication number | Publication date |
---|---|
CN107622244A (en) | 2018-01-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107622244B (en) | Indoor scene fine analysis method based on depth map | |
US11816907B2 (en) | Systems and methods for extracting information about objects from scene information | |
CN109544677B (en) | Indoor scene main structure reconstruction method and system based on depth image key frame | |
He et al. | Deep learning based 3D segmentation: A survey | |
CN108269266B (en) | Generating segmented images using Markov random field optimization | |
Häne et al. | Dense semantic 3d reconstruction | |
Zhang et al. | Deep hierarchical guidance and regularization learning for end-to-end depth estimation | |
CN111798475A (en) | Indoor environment 3D semantic map construction method based on point cloud deep learning | |
TW202034215A (en) | Mapping object instances using video data | |
CN107798725B (en) | Android-based two-dimensional house type identification and three-dimensional presentation method | |
JP7439153B2 (en) | Lifted semantic graph embedding for omnidirectional location recognition | |
Tang et al. | BIM generation from 3D point clouds by combining 3D deep learning and improved morphological approach | |
Liu et al. | 3D Point cloud analysis | |
Qian et al. | Learning pairwise inter-plane relations for piecewise planar reconstruction | |
Pahwa et al. | Locating 3D object proposals: A depth-based online approach | |
US20230334727A1 (en) | 2d and 3d floor plan generation | |
Pintore et al. | Automatic modeling of cluttered multi‐room floor plans from panoramic images | |
Wang et al. | Understanding of wheelchair ramp scenes for disabled people with visual impairments | |
CN116385660A (en) | Indoor single view scene semantic reconstruction method and system | |
Mohan et al. | Room layout estimation in indoor environment: a review | |
Zhang et al. | A robust visual odometry based on RGB-D camera in dynamic indoor environments | |
Pintore et al. | Automatic 3D reconstruction of structured indoor environments | |
CN116030335A (en) | Visual positioning method and system based on indoor building framework constraint | |
Zioulis et al. | Monocular spherical depth estimation with explicitly connected weak layout cues | |
Zhang et al. | Geometric and Semantic Modeling from RGB-D Data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |