CN110458805B

CN110458805B - Plane detection method, computing device and circuit system

Info

Publication number: CN110458805B
Application number: CN201910605510.4A
Authority: CN
Inventors: 何凯文; 李阳; 刘昆
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-03-26
Filing date: 2019-07-05
Publication date: 2022-05-13
Anticipated expiration: 2039-07-05
Also published as: CN110458805A

Abstract

The application discloses a plane detection method, a computing device and a circuit system, wherein the method comprises the following steps: the computing equipment acquires image data to be processed, and then the image data to be processed is segmented to obtain N sub-image data, wherein N is an integer larger than 1. And then, determining point cloud information corresponding to at least one sub-image data in the N sub-image data, clustering point clouds corresponding to the N sub-image data according to the point cloud information corresponding to at least one sub-image data in the N sub-image data to obtain K crude extraction planes, and optimizing the K crude extraction planes to obtain L optimized planes, wherein K is a positive integer not greater than N, and L is a positive integer not greater than K. This may enable the computing device to detect more than one plane in the image data.

Description

Plane detection method, computing device and circuit system

The present application claims priority of chinese patent application having application number 201910234537.7, entitled "a plane detection method and electronic device" filed by chinese patent office on 26/03/2019, the entire contents of which are incorporated herein by reference.

Technical Field

The present application relates to the field of computer vision technologies, and in particular, to a plane detection method, a computing device, and a circuit system.

Background

At present, three-dimensional reconstruction of objects and Augmented Reality (AR) games based on mobile phones have become a reality and are popular among large mobile phone manufacturers and users. Three-dimensional spatial plane detection is an important and fundamental function in the calculation of object three-dimensional reconstruction and augmented reality games, because after a plane is detected, the anchor point of the object can be further determined, and the object is rendered at the determined anchor point. At present, three-dimensional plane detection functions are being added to various small devices, but the computing power of the devices is limited, and the devices cannot process algorithms with higher complexity.

Modern AR applications, such as AR games, object and character modeling programs, such as complex scene-based interactive programs, mixed reality applications, etc., put very high demands on the scene understanding capability of the backend programs, one of which is to understand the three-dimensional structure of the scene based on image sensor or depth sensor information. From the current computing power of the mobile terminal, the processing power of the algorithm for understanding the scene by the front-end program and the back-end program is very limited. Until now, only the dominant planar structure in the scene, i.e. the position in three-dimensional space of the largest plane in the scene, could be determined. The AR solutions now in commercial use are also basically based on the main plane (i.e. single plane) in the scene for later work including modeling, rendering, etc.

Along with the increasing preference degree of consumers to the artificial intelligence application, the requirements of users on the functions of application programs are increased gradually, the requirements of augmented reality application programs on the understanding capability of rear-end scenes are improved, and the single plane detection cannot meet the use requirements of the users.

Disclosure of Invention

The application provides a plane detection method, a computing device and a circuit system, which are used for enabling the computing device to detect more than one plane in image data.

In a first aspect, an embodiment of the present application provides a plane detection method, where the method is applied to a computing device, and the method includes: the computing equipment acquires image data to be processed, and then the image data to be processed is segmented to obtain N sub-image data, wherein N is an integer larger than 1. Then, the computing equipment determines point cloud information corresponding to at least one sub-image data in the N sub-image data, then carries out clustering processing on the point clouds corresponding to the N sub-image data according to the point cloud information corresponding to at least one sub-image data in the N sub-image data to obtain K crude extraction planes, and carries out optimization processing on the K crude extraction planes to obtain L optimized planes, wherein K is a positive integer not greater than N, and L is a positive integer not greater than K.

In the embodiment of the application, the image data to be processed is segmented to obtain N pieces of sub-image data, and then the point clouds corresponding to the N pieces of sub-image data are clustered, compared with the scheme that in the prior art, the detection is stopped when the image to be processed is traversed from the center until one plane is detected, and only one plane can be detected, the scheme provided by the application can perform clustering on the N pieces of sub-image data obtained by segmenting the image to be processed, so that the computing equipment can detect more than one plane in the image data.

In one possible design, the image data to be processed is a depth image, and the depth image includes image coordinates and depth values of each pixel point. The method comprises the steps that a computing device obtains a depth image, then the depth image is segmented to obtain N sub-depth images, then the computing device determines point cloud information corresponding to at least one sub-depth image in the N sub-depth images, and the mean square error of a plane after fitting of the point cloud corresponding to the at least one sub-depth image is determined according to the point cloud information corresponding to each sub-image data in the N sub-depth images. And then, the computing equipment determines the sub-depth images meeting the first condition from the N sub-depth images to form a sub-image set to be processed, and clusters the point clouds corresponding to the sub-depth images included in the sub-image set to be processed to obtain K crude extraction planes. The first condition comprises that the mean square error of a fitted plane of the point cloud corresponding to the sub-depth image is smaller than or equal to a first threshold value.

In the design, N sub-depth images are obtained by segmenting the depth image, then the point cloud corresponding to the sub-depth image which meets the first condition in the N sub-depth images is clustered, and the mean square error of the fitted plane of the point clouds corresponding to the sub-depth images meeting the first condition is less than or equal to a first threshold, which indicates that the points in the point clouds corresponding to the sub-depth images meeting the first condition are on one plane, that is, screening out the sub-depth images which can be planes for the corresponding point clouds in the N sub-depth images for clustering, not clustering the sub-depth images of which the point clouds are not one plane, all the sub-depth images in the N sub-depth images do not need to be clustered, so that not only more than one plane in the sub-depth images can be detected, but also the time of the plane detection process can be saved.

In one possible design, the image data to be processed is a depth image. And the computing equipment acquires the depth image, and segments the depth image to obtain N sub-depth images. And then point cloud information corresponding to at least one sub-depth image in the N sub-depth images is determined, the point cloud corresponding to each sub-depth image in the N sub-depth images is used as a node, a graph structure is constructed, and each node in the graph structure stores the point cloud information corresponding to the node. The computing equipment traverses each node in the graph structure, determines two nodes meeting a second condition in the graph structure, and constructs an edge between the two nodes meeting the second condition; the second condition comprises that the depth values of the point clouds corresponding to any one node of the two nodes are continuous, and the included angle between the normal vectors of the point clouds corresponding to the two nodes is smaller than the included angle threshold value. And then, the computing equipment determines sub-depth images corresponding to nodes with at least one edge in the image structure in the N sub-depth images to form a sub-image set to be processed, and clusters point clouds corresponding to the sub-depth images in the sub-image set to be processed to obtain K crude extraction planes.

In the design, the depth image is segmented to obtain N sub-depth images, then a graph structure is constructed according to point cloud information corresponding to the N sub-depth images, then an edge is constructed between two nodes meeting a second condition in the graph structure, namely, an edge is constructed between two nodes which are possibly fitted into a plane, and then subsequent clustering processing is carried out on the nodes with the edge, so that the subsequent processing is not required to be carried out on the sub-depth images which are less likely to be fitted into the plane, and the time of the whole plane detection process can be saved.

In one possible design, the image data to be processed includes a first RGB image and a second RGB image captured by a binocular camera. After the computing equipment acquires a first RGB image and a second RGB image, image pre-segmentation is carried out on the first RGB image to obtain N first face blocks, image pre-segmentation is carried out on the second RGB image to obtain N second face blocks, and the first face blocks and the second face blocks have position corresponding relations. For each of the N first face blocks, the computing device performs: and determining a second face block having a position corresponding relation with the first face block from the N second face blocks, determining a parallax map according to the first face block and the second face block having the position corresponding relation with the first face block, and determining the sub-depth image according to the parallax map. So that the N sub-depth images can be determined. Then, the computing device may form a sub-image set to be processed according to the determined N sub-depth images, determine point cloud information corresponding to at least one sub-depth image of the N sub-depth images, and then cluster point clouds corresponding to the N sub-depth images included in the sub-image set to be processed according to the point cloud information corresponding to at least one sub-image data of the N sub-image data, so as to obtain K crude extraction planes.

In the design, the first RGB image and the second RGB image shot by the binocular camera are respectively segmented to obtain N first surface blocks and N second surface blocks which can be planes, N disparity maps are determined, and then N sub-depth images which can be planes can be determined. Then, the point clouds corresponding to the N sub-depth images which can be planes can be clustered, and the whole images in the first RGB image and the second RGB image do not need to be clustered, so that the processing amount can be reduced, more than one plane in the image data can be detected, and the plane detection speed can be increased.

In one possible design, a manner is provided below that may implement clustering of point clouds corresponding to sub-depth images included in a sub-image set to be processed to obtain K coarse extraction planes, in which a computing device may establish a minimum heap data structure according to point cloud information corresponding to each sub-depth image included in the sub-image set to be processed, where the minimum heap data structure is used to rank the sub-depth images in the sub-image set to be processed according to a mean square error of the point cloud corresponding to each sub-depth image, and a mean square error of the point clouds corresponding to the sub-depth images located at a top of a heap is minimum. Aiming at the minimum pile data structure, executing preset operation until the mean square error of a fitted plane of point clouds corresponding to any two nodes in the minimum pile data structure is larger than a first threshold value, and obtaining K crude extraction planes; wherein the preset operation comprises: taking out the sub-depth image from the top of the pile in the minimum pile data structure, and if the sub-depth image meeting the third condition is determined from the sub-depth images adjacent to the sub-depth image, fusing the sub-depth image and the sub-depth image meeting the third condition to obtain a fused sub-depth image; the third condition comprises that the mean square error of the point cloud corresponding to the sub-depth image after being fitted with the plane is smaller than a first threshold value and the mean square error is minimum; adding the fused sub-depth image to a minimum pile data structure.

In the design, the mean square errors of the point clouds corresponding to the N sub-depth images can be sequenced by establishing a minimum pile data structure, the mean square errors of the point clouds corresponding to the sub-depth images from the bottom of the minimum pile to the top of the pile are smaller and smaller, and the mean square error of the point clouds corresponding to the sub-depth images positioned on the top of the pile is minimum, so that the point clouds corresponding to the sub-depth images with the minimum mean square errors can be taken out from the top of the minimum pile each time, the point clouds corresponding to the sub-depth images with the minimum mean square errors in the minimum pile data structure are preferentially clustered, namely, the point clouds most likely to form a plane are preferentially clustered, and thus the nodes which are likely to be a plane can be found as fast as possible and fused. And aiming at the point cloud corresponding to each sub-depth image taken out from the minimum pile data structure, if the point cloud corresponding to another sub-depth image which can be fused can be found, continuously pressing the fused point cloud into the minimum pile for sequencing until the point clouds corresponding to the remaining sub-depth images in the minimum pile can not be continuously fused, and finishing the clustering. Thereby, the speed of determining the rough extraction plane can be accelerated.

In one possible design, the image data to be processed is a point cloud included in a three-dimensional space. Segmenting the image data to be processed to obtain N sub-image data, which may include: taking the three-dimensional space as a node of a first level of an octree structure; taking the three-dimensional space as a node of a first level of the octree structure to construct the octree structure; for each child node included in the first level and the ith level in the octree structure, the computing device may perform the following operations: if the child node meets the fourth condition, carrying out eight equal division on the child node to obtain eight child nodes of the (i + 1) th level; until all child nodes included in the last level meet the fifth condition, constructing an octree structure including M levels of child nodes; wherein the fourth condition comprises that the mean square error of the point clouds corresponding to the sub-nodes is greater than a first threshold; i is an integer greater than 1, the ith level comprises 8i sub-nodes, the fifth condition comprises that the mean square error of the point clouds corresponding to the sub-nodes is not greater than a first threshold value, or the point clouds corresponding to the sub-nodes comprise the number of points less than a number threshold value. The computing device then determines point cloud information corresponding to at least one of the N undivided sub-nodes. Then, the computing equipment carries out clustering processing on the point clouds corresponding to the N unsegmented sub-nodes in the octree structure according to the point cloud information corresponding to at least one unsegmented sub-node in the N unsegmented sub-nodes, and K crude extraction planes are obtained.

In the design, the point cloud in the three-dimensional space is used as a first-layer node of an octree structure, the first-layer node is divided into eight equal parts, then, for each child node in each level, the mean square error of the point cloud corresponding to the child node is greater than a first threshold, and it is described that the points included in the point cloud corresponding to the child node are not on one plane, so that the child nodes with the mean square error greater than the first threshold are divided until the points included in the point cloud corresponding to each child node in the octree structure are on one plane, or the points included in the point cloud corresponding to the child node are sufficiently small, that is, the number of the points is less than a number threshold, and the division is not continued, so that the finally obtained N child nodes which are not continuously divided in the octree structure include two types: one is a sub-node where the mean square error of the point cloud is less than or equal to a first threshold, and the other is a sub-node where the point cloud includes a number of points less than a number threshold. Then, clustering processing is carried out on the N sub-nodes which are not continuously divided, and more than one plane can be obtained.

In one possible design, clustering point clouds corresponding to N unsegmented sub-nodes in an octree structure according to point cloud information corresponding to at least one unsegmented sub-node of the N unsegmented sub-nodes to obtain K coarse extraction planes may include: the computing equipment determines a normal vector of a point cloud corresponding to each unsegmented sub-node in the N unsegmented sub-nodes according to point cloud information corresponding to at least one unsegmented sub-node in the N unsegmented sub-nodes, converts the normal vector into points in a parameter space through Hough, and then determines K point sets formed by the normal vectors of the point cloud corresponding to the N unsegmented sub-nodes in the parameter space, wherein each point set is provided with an aggregation center; and aiming at each point set, determining points in a preset range around the aggregation center of the point set, and fusing point clouds corresponding to the unsegmented sub-nodes corresponding to the points in the preset range into a rough extraction plane. It should be noted that the non-partitioned child node herein refers to a leaf node, i.e., the non-partitioned child node has no child node.

In the design, the normal vectors of the point clouds corresponding to the N unsegmented sub-nodes are converted into one point in a parameter space through Hough (Hough) conversion, so that the points formed by the normal vectors of the point clouds belonging to the same plane in the parameter space are converged together, the points tend to a central point, the point sets formed in the parameter space are determined, each point set corresponds to a coarse extraction plane in the parameter space, therefore, the sub-nodes with the coplanar relation of the N unsegmented sub-nodes can be quickly obtained, then the sub-nodes with the coplanar relation are fused into one coarse extraction plane, and more than one coarse extraction plane can be quickly obtained.

In one possible design, a manner is provided below in which the K crude extraction planes can be optimized to obtain L optimized back planes: and determining a normal vector of each coarse extraction plane in the K coarse extraction planes, traversing any one of the K coarse extraction planes, and if a coarse extraction plane meeting a sixth condition exists, fusing the coarse extraction plane and the coarse extraction plane meeting the sixth condition into a plane to obtain L optimized planes. Wherein the sixth condition comprises: the normal vector is parallel to the normal vector of the rough extraction plane, and the variance after the plane is fitted with the rough extraction plane is smaller than a variance threshold value.

Through the design, a plurality of rough extraction planes which are actually one plane in the K rough extraction planes can be fused, so that the L optimized back planes are more accurate.

In a second aspect, an embodiment of the present application provides a plane detection method, which is applied to a computing device, and includes: acquiring image data to be processed; performing semantic segmentation on image data to be processed to obtain N sub-image data with labeling information, wherein N is an integer greater than 1; the labeling information is used for labeling the target object in the sub-image data; according to the labeling information of each sub-image data, Q sub-image data with planes are determined from N sub-image data with labeling information; q is an integer greater than 0 and less than or equal to N; determining point cloud information corresponding to each sub-image data with the plane in the Q sub-image data with the plane; determining K crude extraction planes from the Q sub-image data with planes according to the point cloud information corresponding to each sub-image data with planes in the Q sub-image data with planes; k is an integer greater than or equal to Q; optimizing the K crude extraction planes to obtain L optimized planes; l is a positive integer not greater than K.

Based on the scheme, semantic segmentation is carried out on the data image to be processed to obtain N pieces of sub-image data with marking information, and then Q pieces of sub-image data with planes are determined from the N pieces of sub-image data with marking information, so that plane detection can be carried out on the Q pieces of sub-image data with planes only, plane detection is not required to be carried out on the sub-image data without planes, the processing amount can be reduced, and more than one plane can be detected by processing the Q pieces of sub-image data with planes.

In one possible design, the image data to be processed includes a first RGB image and a second RGB image captured by a binocular camera. Performing semantic segmentation on image data to be processed to obtain N sub-image data with labeling information, which may include: and performing semantic segmentation on the first RGB image to obtain N first sub-images with labeling information, and performing semantic segmentation on the second RGB image to obtain N second sub-images with labeling information, wherein each first sub-image with labeling information and each second sub-image with labeling information, which has a position corresponding relation with the first sub-image, form sub-image data with labeling information. Determining point cloud information corresponding to each planar sub-image data in the Q planar sub-image data may include: for each of the Q sub-image data having a plane, the computing device may perform the following operations: determining a disparity map according to a first sub-image with labeling information included in the sub-image data with the plane and a second sub-image with the labeling information and having a position corresponding relation with the first sub-image, determining a sub-depth image according to the disparity map, and determining point cloud information corresponding to the sub-image data with the plane according to the sub-depth image. Determining K coarse extraction planes from the Q sub-image data with planes according to the point cloud information corresponding to each sub-image data with planes in the Q sub-image data with planes, which may include: and determining K rough extraction planes from the Q sub-depth images according to the point cloud information corresponding to the sub-image data with the planes.

In the design, when the image data to be processed includes a first RGB image and a second RGB image photographed by a binocular camera, semantic segmentation may be performed on the first RGB image and the second RGB image, Q first sub-images having a plane and Q second sub-images having a plane and having a position corresponding relation to the first sub-images having a plane are obtained from the N first sub-images and the N second sub-images obtained by the segmentation, respectively, Q disparity maps are obtained according to the Q first sub-images having a plane and the Q second sub-images having a plane, and then Q sub-depth images are obtained according to the Q disparity maps, so that only the Q sub-depth images having a plane need to be subjected to plane detection, and the sub-image data not having a plane need not be subjected to plane detection, so that the amount of processing may be reduced.

In one possible design, determining K coarse extraction planes from the Q sub-depth images according to point cloud information corresponding to the sub-image data having the planes may include: and taking the point cloud corresponding to each sub-depth image in the Q sub-depth images as a node to construct a graph structure, wherein each node in the graph structure stores point cloud information corresponding to the node. Traversing each node in the graph structure, determining two nodes meeting a second condition in the graph structure, and constructing an edge between the two nodes meeting the second condition, wherein the second condition comprises that the depth values of point clouds corresponding to any one node in the two nodes are continuous, and an included angle between normal vectors of the point clouds corresponding to the two nodes is smaller than an included angle threshold value; determining sub-depth images corresponding to nodes with at least one edge in the graph structure in the Q sub-depth images to form a sub-image set to be processed; and clustering point clouds corresponding to the sub-depth images included in the sub-image set to be processed to obtain K crude extraction planes.

In the design, a graph structure is constructed according to point cloud information corresponding to Q sub-depth images, then an edge is constructed between two nodes meeting a second condition in the graph structure, namely, an edge is constructed between two nodes which are possibly fitted into a plane, and then subsequent clustering processing is carried out on the nodes with edges, namely, clustering processing is carried out on the sub-depth images which are possibly fitted into the plane, so that subsequent processing is not required to be carried out on the sub-depth images which are less likely to be fitted into the plane, and the time of the whole plane detection process can be saved.

In one possible design, clustering point clouds corresponding to sub-depth images included in a sub-image set to be processed to obtain K crude extraction planes may include: and establishing a minimum pile data structure according to the point cloud information corresponding to each sub-depth image in the sub-image set to be processed, wherein the minimum pile data structure is used for sequencing the sub-depth images in the sub-image set to be processed according to the mean square error of the point cloud corresponding to each sub-depth image, and the mean square error of the point cloud corresponding to the sub-depth image positioned at the top of the pile is minimum. Then, aiming at the minimum pile data structure, executing preset operation until the mean square error of a fitted plane of point clouds corresponding to any two nodes in the minimum pile data structure is larger than a first threshold value, and obtaining K crude extraction planes; wherein the preset operation comprises: taking out the sub-depth image from the top of the pile in the minimum pile data structure, and if the sub-depth image meeting the third condition is determined from the sub-depth images adjacent to the sub-depth image, fusing the sub-depth image and the sub-depth image meeting the third condition to obtain a fused sub-depth image; the third condition comprises that the mean square error of the point cloud corresponding to the sub-depth image after being fitted with the plane is smaller than a first threshold value and the mean square error is minimum; adding the fused sub-depth image to a minimum pile data structure.

In the design, the mean square errors of the point clouds corresponding to Q sub-depth images included in a sub-image set to be processed can be sequenced by establishing a minimum pile data structure, the mean square errors of the point clouds corresponding to the sub-depth images from the bottom of a minimum pile to the top of the pile are smaller and smaller, and the mean square error of the point clouds corresponding to the sub-depth images positioned at the top of the pile is minimum, so that the point clouds corresponding to the sub-depth images with the minimum mean square errors can be taken out from the top of the minimum pile each time, the point clouds corresponding to the sub-depth images with the minimum mean square errors in the minimum pile data structure are preferentially clustered, namely, the point clouds most likely to form a plane are preferentially clustered, and thus the nodes which are likely to be planes can be found as fast as possible and are fused. And aiming at the point cloud corresponding to each sub-depth image taken out from the minimum pile data structure, if the point cloud corresponding to another sub-depth image which can be fused can be found, continuously pressing the fused point cloud into the minimum pile for sequencing until the point clouds corresponding to the remaining sub-depth images in the minimum pile can not be continuously fused, and finishing the clustering. Thereby, the speed of determining the rough extraction plane can be accelerated.

In one possible design, the optimizing the K crude extraction planes to obtain L optimized planes may include: and determining a normal vector of each coarse extraction plane in the K coarse extraction planes, traversing any one of the K coarse extraction planes, and if a coarse extraction plane meeting a sixth condition exists, fusing the coarse extraction plane and the coarse extraction plane meeting the sixth condition into a plane to obtain L optimized planes. Wherein the sixth condition comprises: the normal vector is parallel to the normal vector of the rough extraction plane, and the variance after the plane is fitted with the rough extraction plane is smaller than a variance threshold value.

In a third aspect, embodiments of the present application provide a computing device comprising at least one processor. The at least one processor configured to perform the following operations: acquiring image data to be processed; segmenting image data to be processed to obtain N pieces of sub-image data, wherein N is an integer larger than 1; determining point cloud information corresponding to at least one sub-image data in the N sub-image data; performing clustering processing on point clouds corresponding to the N sub-image data according to point cloud information corresponding to at least one sub-image data in the N sub-image data to obtain K crude extraction planes; k is a positive integer not greater than N; optimizing the K crude extraction planes to obtain L optimized planes; l is a positive integer not greater than K.

In one possible design, the image data to be processed is a depth image; the depth image comprises the image coordinate and the depth value of each pixel point; the method for segmenting the image data to be processed to obtain N sub-image data comprises the following steps: segmenting the depth image to obtain N sub-depth images; determining point cloud information corresponding to at least one sub-image data in the N sub-image data, wherein the point cloud information comprises: determining point cloud information corresponding to at least one sub-depth image in the N sub-depth images; according to point cloud information corresponding to at least one sub-image data in the N sub-image data, carrying out clustering processing on point clouds corresponding to the N sub-image data to obtain K crude extraction planes, wherein the method comprises the following steps: determining the mean square error of a fitted plane of the point cloud corresponding to each sub-depth image according to the point cloud information corresponding to each sub-image data in the N sub-depth images; determining sub-depth images meeting a first condition from the N sub-depth images to form a sub-image set to be processed; the first condition is that the mean square error of a fitted plane of a point cloud corresponding to the sub-depth image is less than or equal to a first threshold; and clustering point clouds corresponding to the sub-depth images included in the sub-image set to be processed to obtain K crude extraction planes.

In one possible design, the image data to be processed is a depth image. Segmenting the image data to be processed to obtain N sub-image data, which may include: and segmenting the depth image to obtain N sub-depth maps. Determining point cloud information corresponding to at least one sub-image data of the N sub-image data may include: and determining point cloud information corresponding to at least one sub-depth image in the N sub-depth images. Performing clustering processing on the point clouds corresponding to the N sub-image data according to the point cloud information corresponding to at least one sub-image data of the N sub-image data to obtain K crude extraction planes, which may include: taking the point cloud corresponding to each sub-depth image in the N sub-depth images as a node to construct a graph structure; each node in the graph structure stores point cloud information corresponding to the node; traversing each node in the graph structure, determining two nodes meeting a second condition in the graph structure, and constructing an edge between the two nodes meeting the second condition; the second condition comprises that the depth values of the point clouds corresponding to any one node of the two nodes are continuous, and the included angle between the normal vectors of the point clouds corresponding to the two nodes is smaller than the included angle threshold value; determining sub-depth images corresponding to nodes with at least one edge in the image structure in the N sub-depth images to form a sub-image set to be processed; and clustering point clouds corresponding to the sub-depth images included in the sub-image set to be processed to obtain K crude extraction planes.

In one possible design, the image data to be processed includes a first RGB image and a second RGB image taken by a binocular camera; segmenting the image data to be processed to obtain N sub-image data, which may include: performing image pre-segmentation on the first RGB image to obtain N first face blocks; performing image pre-segmentation on the second RGB image to obtain N second face blocks; the first face block and the second face block have a position corresponding relation; for each of the N first face tiles, the computing device may perform the following operations: determining a second face block which has a position corresponding relation with the first face block from the N second face blocks; determining a disparity map according to the first face block and a second face block which has a position corresponding relation with the first face block; determining a sub-depth image according to the disparity map; and forming a sub-image set to be processed according to the determined N sub-depth images. Determining point cloud information corresponding to at least one sub-image data of the N sub-image data may include: determining point cloud information corresponding to at least one sub-depth image in the N sub-depth images; performing clustering processing on the point clouds corresponding to the N sub-image data according to the point cloud information corresponding to at least one sub-image data of the N sub-image data to obtain K crude extraction planes, which may include: and clustering point clouds corresponding to N sub-depth images included in the sub-image set to be processed according to point cloud information corresponding to at least one sub-image data in the N sub-image data to obtain K crude extraction planes.

In one possible design, clustering point clouds corresponding to sub-depth images included in a sub-image set to be processed to obtain K crude extraction planes may include: establishing a minimum pile data structure according to point cloud information corresponding to each sub-depth image in the sub-image set to be processed; the minimum heap data structure is used for sequencing the sub-depth images in the sub-depth image set to be processed according to the mean square error of the point cloud corresponding to each sub-depth image, and the mean square error of the point cloud corresponding to the sub-depth image positioned at the top of the heap is minimum. Aiming at the minimum pile data structure, executing preset operation until the mean square error of a fitted plane of point clouds corresponding to any two nodes in the minimum pile data structure is larger than a first threshold value, and obtaining K crude extraction planes; wherein the preset operation comprises: taking out the sub-depth image from the top of the pile in the minimum pile data structure, and if the sub-depth image meeting the third condition is determined from the sub-depth images adjacent to the sub-depth image, fusing the sub-depth image and the sub-depth image meeting the third condition to obtain a fused sub-depth image; the third condition comprises that the mean square error of the point cloud corresponding to the sub-depth image after being fitted with the plane is smaller than a first threshold value and is minimum; adding the fused sub-depth image to a minimum pile data structure.

In one possible design, the image data to be processed is a point cloud included in a three-dimensional space. Segmenting the image data to be processed to obtain N sub-image data, which may include: taking the three-dimensional space as a node of a first level of the octree structure; for each child node included in the first level and the ith level in the octree structure, the computing device may perform the following operations: if the child node meets the fourth condition, carrying out eight equal division on the child node to obtain eight child nodes of the (i + 1) th level; wherein the fourth condition comprises that the mean square error of the point clouds corresponding to the sub-nodes is greater than a first threshold; i is an integer greater than 1, and the ith level comprises 8i child nodes; until all child nodes included in the last level meet the fifth condition, constructing an octree structure including M levels of child nodes; the fifth condition comprises that the mean square error of the point clouds corresponding to the sub-nodes is not larger than a first threshold value, or the point clouds corresponding to the sub-nodes comprise the number of points smaller than a number threshold value; n unsegmented child nodes in the octree structure are determined. Determining point cloud information corresponding to at least one sub-image data in the N sub-image data; determining point cloud information corresponding to at least one sub-image data of the N sub-image data may include: and determining point cloud information corresponding to at least one un-segmented child node in the N un-segmented child nodes. Performing clustering processing on the point clouds corresponding to the N sub-image data according to the point cloud information corresponding to at least one sub-image data of the N sub-image data to obtain K crude extraction planes, which may include: and carrying out clustering processing on the point clouds corresponding to the N unsegmented sub-nodes in the octree structure according to the point cloud information corresponding to at least one unsegmented sub-node in the N unsegmented sub-nodes to obtain K crude extraction planes.

In one possible design, clustering point clouds corresponding to N unsegmented sub-nodes in an octree structure according to point cloud information corresponding to at least one unsegmented sub-node of the N unsegmented sub-nodes to obtain K coarse extraction planes may include: determining a normal vector of a point cloud corresponding to each unsegmented sub-node in the N unsegmented sub-nodes according to point cloud information corresponding to at least one unsegmented sub-node in the N unsegmented sub-nodes, and converting the normal vector into a point in a parameter space through Hough; k point sets formed by normal vectors of point clouds corresponding to the N unsegmented child nodes in a parameter space are determined, and each point set is provided with an aggregation center; for each point set, determining points which fall within a preset range around the gathering center of the point set; and (4) point clouds corresponding to the non-segmented sub-nodes corresponding to the points falling in the preset range are fused into a rough extraction plane.

In one possible design, the optimizing the K crude extraction planes to obtain L optimized planes may include: determining a normal vector of each coarse extraction plane in the K coarse extraction planes; traversing any one of the K rough extraction planes, and if a rough extraction plane meeting a sixth condition exists, fusing the rough extraction plane and the rough extraction plane meeting the sixth condition into one plane to obtain L optimized planes; wherein the sixth condition comprises: the normal vector is parallel to the normal vector of the rough extraction plane, and the variance after the plane is fitted with the rough extraction plane is smaller than a variance threshold value.

In a fourth aspect, embodiments of the present application provide a computing device comprising at least one processor. The at least one processor configured to perform the following operations: acquiring image data to be processed; performing semantic segmentation on image data to be processed to obtain N sub-image data with labeling information, wherein N is an integer greater than 1; the labeling information is used for labeling the target object in the sub-image data; according to the labeling information of each sub-image data, Q sub-image data with planes are determined from N sub-image data with labeling information; q is an integer greater than 0 and less than or equal to N; determining point cloud information corresponding to each sub-image data with the plane in the Q sub-image data with the plane; determining K crude extraction planes from the Q sub-image data with planes according to the point cloud information corresponding to each sub-image data with planes in the Q sub-image data with planes; k is an integer greater than or equal to Q; optimizing the K crude extraction planes to obtain L optimized planes; l is a positive integer not greater than K.

In one possible design, the image data to be processed includes a first RGB image and a second RGB image taken by a binocular camera; performing semantic segmentation on image data to be processed to obtain N sub-image data with labeling information, where N is an integer greater than 1, and may include: performing semantic segmentation on the first RGB image to obtain N first sub-images with labeling information; performing semantic segmentation on the second RGB image to obtain N second sub-images with labeling information; each first sub-image with the labeling information and each second sub-image with the labeling information and having a position corresponding relation with the first sub-image form sub-image data with the labeling information; determining point cloud information corresponding to each planar sub-image data in the Q planar sub-image data may include: for each of the Q sub-image data having a plane, the computing device may perform the following operations: determining a disparity map according to a first sub-image with labeling information included in sub-image data with a plane and a second sub-image with labeling information, wherein the second sub-image has a position corresponding relation with the first sub-image; determining a sub-depth image according to the disparity map; according to the sub-depth image, point cloud information corresponding to the sub-image data with the plane is determined; determining K coarse extraction planes from the Q sub-image data with planes according to the point cloud information corresponding to each sub-image data with planes in the Q sub-image data with planes, which may include: and determining K rough extraction planes from the Q sub-depth images according to the point cloud information corresponding to the sub-image data with the planes.

In one possible design, determining K coarse extraction planes from the Q sub-depth images according to the point cloud information corresponding to the sub-image data having planes may include: taking the point cloud corresponding to each sub-depth image in the Q sub-depth images as a node to construct a graph structure; each node in the graph structure stores point cloud information corresponding to the node; traversing each node in the graph structure, determining two nodes meeting a second condition in the graph structure, and constructing an edge between the two nodes meeting the second condition, wherein the second condition comprises that the depth values of point clouds corresponding to any one node in the two nodes are continuous, and an included angle between normal vectors of the point clouds corresponding to the two nodes is smaller than an included angle threshold value; determining sub-depth images corresponding to nodes with at least one edge in the graph structure in the Q sub-depth images to form a sub-image set to be processed; and clustering point clouds corresponding to the sub-depth images included in the sub-image set to be processed to obtain K crude extraction planes.

In one possible design, clustering point clouds corresponding to sub-depth images included in a sub-image set to be processed to obtain K crude extraction planes may include: establishing a minimum pile data structure according to point cloud information corresponding to each sub-depth image in the sub-image set to be processed; the minimum heap data structure is used for sequencing the sub-depth images in the sub-depth image set to be processed according to the mean square error of the point cloud corresponding to each sub-depth image, and the mean square error of the point cloud corresponding to the sub-depth image positioned at the top of the heap is minimum; aiming at the minimum pile data structure, executing preset operation until the mean square error of a fitted plane of point clouds corresponding to any two nodes in the minimum pile data structure is larger than a first threshold value, and obtaining K crude extraction planes; wherein the preset operation comprises: taking out the sub-depth image from the top of the pile in the minimum pile data structure, and if the sub-depth image meeting the third condition is determined from the sub-depth images adjacent to the sub-depth image, fusing the sub-depth image and the sub-depth image meeting the third condition to obtain a fused sub-depth image; the third condition comprises that the mean square error of the point cloud corresponding to the sub-depth image after being fitted with the plane is smaller than a first threshold value and the mean square error is minimum; adding the fused sub-depth image to a minimum pile data structure.

In one possible design, the optimizing the K crude extraction planes to obtain L optimized planes may include: determining a normal vector of each coarse extraction plane in the K coarse extraction planes; traversing any one of the K rough extraction planes, and if a rough extraction plane meeting a sixth condition exists, fusing the rough extraction plane and the rough extraction plane meeting the sixth condition into a plane; wherein the sixth condition comprises: the normal vector is parallel to the normal vector of the rough extraction plane, and the variance after the plane is fitted with the rough extraction plane is smaller than a variance threshold value.

In a fifth aspect, the present application further provides a computing device including a module/unit for executing any one of the possible designed methods of any one of the above aspects. These modules/units may be implemented by hardware, or by hardware executing corresponding software.

In a sixth aspect, this embodiment also provides a computer-readable storage medium, where the computer-readable storage medium includes a computer program, and when the computer program is run on an electronic device, the electronic device is caused to perform any one of the possible design methods of the foregoing aspects.

In a seventh aspect, this application further provides a program product including instructions that, when executed on an electronic device, cause the computing device to perform any one of the possible design methods of the foregoing aspects.

In addition, for technical effects brought by any one of possible design manners in the third aspect to the seventh aspect, reference may be made to technical effects brought by different design manners in the first aspect or the second aspect, and details are not described herein again.

Drawings

Fig. 1 is a schematic structural diagram of an electronic device provided in the present application;

fig. 2 is a schematic flow chart of a plane detection method provided in the present application;

fig. 3a is a schematic view of a scenario provided by the present application;

FIG. 3b is a schematic diagram of a depth image provided in the present application;

FIGS. 3 c-3 h are schematic views of the plane inspection process provided herein;

fig. 4a is a schematic view of a subdivision of a three-dimensional space provided by the present application;

FIG. 4b is a schematic diagram of an octree structure provided herein;

FIG. 5 is a schematic view of another flat panel inspection process provided herein;

FIG. 6 is a schematic diagram of an image including annotations provided herein;

FIG. 7 is a schematic view of another image including annotations provided herein;

fig. 8a is a schematic diagram of an RGB image provided in the present application;

FIG. 8b is a schematic representation of a pre-segmented image provided herein;

FIG. 9 is a schematic flow chart of another flat panel inspection method provided herein;

fig. 10 is a schematic structural diagram of a computing device provided in the present application.

Detailed Description

The application provides a plane detection method, a computing device and a circuit system, wherein the computing device is used for segmenting based on acquired image data to be processed and respectively carrying out plane detection processing according to sub-image data obtained by segmentation, so that a plurality of planes in a scene can be detected, and the comprehension capability of the computing device to the scene can be improved.

In order to make the objects, technical solutions and advantages of the present application more clear, the present application will be further described in detail with reference to the accompanying drawings.

The plane detection scheme provided by the embodiment of the application can be applied to various computing devices, and the computing devices can be electronic devices and servers. Which may include, but are not limited to, personal computers, server computers, hand-held or laptop devices, mobile devices (such as cell phones, mobile phones, tablets, personal digital assistants, media players, etc.), consumer electronics, minicomputers, mainframe computers, mobile robots, drones, and the like.

When the electronic device needs to detect a plane in image data to be processed, in a possible implementation manner, the electronic device may use the plane detection method provided by the embodiment of the present application to detect multiple planes to obtain a detection result. In another possible implementation manner, the electronic device may send the image data to be processed to another device having a processing capability of implementing the plane detection process, such as a server or a terminal device, and then the electronic device receives the detection result from the other device.

In the following embodiments, a plane detection method provided in the embodiments of the present application is described by taking a computing device as an example.

The method for plane detection provided by the embodiment of the application is applicable to the electronic device shown in fig. 1, and the specific structure of the electronic device is briefly described below.

Fig. 1 is a schematic diagram of a hardware structure of an electronic device applied in the embodiment of the present application. As shown in fig. 1, electronic device 100 may include a display device 110, a processor 120, and a memory 130. The memory 130 may be used for storing software programs and data, and the processor 120 may execute various functional applications and data processing of the electronic device 100 by operating the software programs and data stored in the memory 130.

The memory 130 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program (such as an image capturing function) required by at least one function, and the like; the storage data area may store data (such as audio data, text information, image data, etc.) created according to the use of the electronic apparatus 100, and the like. Further, the memory 130 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 120 is a control center of the electronic device 100, connects various parts of the whole electronic device by various interfaces and lines, performs various functions of the electronic device 100 and processes data by running or executing software programs and/or data stored in the memory 130, thereby performing overall monitoring of the electronic device. Processor 120 may include one or more processing units, such as: the processor 120 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a Neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.

The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referring to a biological neural network structure, for example, by referring to a transfer mode between neurons of a human brain, and can also continuously learn by self. Applications such as intelligent recognition of the electronic device 100 can be realized through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, and the like.

In some embodiments, processor 120 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.

The I2C interface is a bi-directional synchronous serial bus that includes a serial data line (SDA) and a Serial Clock Line (SCL). In some embodiments, processor 120 may include multiple sets of I2C buses. The processor 120 may be coupled to the touch sensor, the charger, the flash, the camera 160, etc. through different I2C bus interfaces, respectively. For example: the processor 120 may be coupled to the touch sensor via an I2C interface, such that the processor 120 and the touch sensor communicate via an I2C bus interface to implement touch functionality of the electronic device 100.

The I2S interface may be used for audio communication. In some embodiments, processor 120 may include multiple sets of I2S buses. The processor 120 may be coupled to the audio module via an I2S bus to enable communication between the processor 120 and the audio module. In some embodiments, the audio module may transmit audio signals to the WiFi module 190 through the I2S interface, so as to implement the function of answering a call through a bluetooth headset.

The PCM interface may also be used for audio communication, sampling, quantizing and encoding analog signals. In some embodiments, the audio module and WiFi module 190 may be coupled through a PCM bus interface. In some embodiments, the audio module may also transmit the audio signal to the WiFi module 190 through the PCM interface, so as to implement the function of answering a call through the bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communications. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is typically used to connect the processor 120 with the WiFi module 190. For example: the processor 120 communicates with the bluetooth module in the WiFi module 190 through the UART interface to implement the bluetooth function. In some embodiments, the audio module may transmit the audio signal to the WiFi module 190 through the UART interface, so as to realize the function of playing music through the bluetooth headset.

The MIPI interface may be used to connect the processor 120 with peripheral devices such as the display device 110, the camera 160, and the like. The MIPI interface includes a camera 160 serial interface (CSI), a display screen serial interface (DSI), and the like. In some embodiments, processor 120 and camera 160 communicate over a CSI interface to implement the capture functionality of electronic device 100. The processor 120 and the display screen communicate through the DSI interface to implement the display function of the electronic device 100.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal and may also be configured as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 120 with the camera 160, the display device 110, the WiFi module 190, the audio module, the sensor module, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, a MIPI interface, and the like.

The USB interface is an interface which accords with the USB standard specification, and specifically can be a Mini USB interface, a Micro USB interface, a USB Type C interface and the like. The USB interface may be used to connect a charger to charge the electronic device 100, and may also be used to transmit data between the electronic device 100 and a peripheral device. And the earphone can also be used for connecting an earphone and playing audio through the earphone. The interface may also be used to connect other electronic devices, such as AR devices and the like.

It should be understood that the connection relationship between the modules according to the embodiment of the present invention is only illustrative, and is not limited to the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.

Also included in the electronic device 100 is a camera 160 for capturing images or video. The camera 160 may be a general camera or a focusing camera.

The electronic device 100 may further include an input device 140 for receiving input numerical information, character information, or contact touch operation/non-contact gesture, and generating signal input related to user setting and function control of the electronic device 100, and the like.

The display device 110 includes a display panel 111 for displaying information input by a user or information provided to the user, various menu interfaces of the electronic device 100, and the like, and in the embodiment of the present application, is mainly used for displaying an image to be detected acquired by a camera or a sensor in the electronic device 100. Alternatively, the display panel may be configured in the form of a Liquid Crystal Display (LCD) or an organic light-emitting diode (OLED) to configure the display panel 111.

The electronic device 100 may also include one or more sensors 170, such as image sensors, infrared sensors, laser sensors, pressure sensors, gyroscope sensors, barometric sensors, magnetic sensors, acceleration sensors, distance sensors, proximity light sensors, ambient light sensors, fingerprint sensors, touch sensors, temperature sensors, bone conduction sensors, and the like, wherein the image sensors may be time of flight (TOF) sensors, structured light sensors, and the like.

In addition, the electronic device 100 may also include a power supply 150 for powering other modules. The electronic device 100 may further include a Radio Frequency (RF) circuit 180 for performing network communication with a wireless network device, and a WiFi module 190 for performing WiFi communication with other devices, for example, for acquiring images or data transmitted by other devices.

Although not shown in fig. 1, the electronic device 100 may further include a flash, a bluetooth module, an external interface, a button, a motor, and other possible functional modules, which are not described in detail herein.

Based on the above introduction, the present application provides a plane detection method and a computing device, wherein the method may enable the computing device to detect more than one plane in image data. In the embodiment of the application, the method and the computing equipment are based on the same inventive concept, and because the principles of solving the problems of the method and the computing equipment are similar, the implementation of the computing equipment and the method can be mutually referred, and repeated parts are not repeated.

In the embodiment of the present application, the electronic device 100 is taken as an example for description, but the embodiment of the present invention is not limited to be applied to other types of computing devices. Referring to fig. 2, a specific process of the plane detection method may include:

step 201: the electronic device 100 acquires image data to be processed.

Here, the image data to be processed may be a two-dimensional image, such as a depth image in the following first embodiment, a first RGB image and a second RGB image in the following third embodiment, or may be three-dimensional point cloud data, such as a point cloud in the following second embodiment, and is not limited specifically here.

It should be understood that the image data to be processed may be captured by the electronic device, may be obtained from a gallery in the electronic device for storing images, or may be transmitted by other devices.

Step 202: the electronic device 100 segments image data to be processed to obtain N sub-image data, where N is an integer greater than 1.

Step 203: the electronic device 100 determines point cloud information corresponding to at least one sub-image data of the N sub-image data.

Step 204: the electronic device 100 performs clustering processing on the point clouds corresponding to the N sub-image data according to the point cloud information corresponding to at least one sub-image data of the N sub-image data to obtain K crude extraction planes; k is a positive integer not greater than N.

Step 205: the electronic device 100 performs optimization processing on the K crude extraction planes to obtain L optimized planes; l is a positive integer not greater than K.

In the embodiment of the application, the electronic device divides the image data to be processed, clusters the point clouds corresponding to the N sub-image data obtained by dividing, and can detect more than one plane in the image data to be processed.

The above-described plane detection method shown in fig. 2 will be described in detail with reference to specific embodiments.

Example one

In this embodiment of the application, the image data to be processed in step 201 is a depth image, and the depth image may be regarded as a grayscale image, where a grayscale value of each pixel point in the grayscale image may represent a projection of a certain point in a scene on an optical axis of a camera. Referring to fig. 3a, a schematic diagram of an RGB image of a scene provided in an embodiment of the present application is shown. The depth information is extracted from the scene in the three-dimensional space corresponding to the RGB image shown in fig. 3a, resulting in the depth image shown in fig. 3 b.

In the embodiment of the present application, the depth information may be acquired by using a ToF camera, a structured light, a laser scanning, and the like, so as to obtain a depth image. It should be understood that any other manner (or camera) that can obtain a depth image may be used in the embodiments of the present application to obtain a depth image. Hereinafter, only the depth image obtained by using the ToF camera is taken as an example for explanation, but this does not cause limitation on the way of obtaining the depth image, and details are not described later.

Although the point cloud is a three-dimensional concept and the pixel point in the depth image is a two-dimensional concept, when the depth value of a certain point in the two-dimensional image is known, the image coordinate of the point can be converted into world coordinates in a three-dimensional space, and therefore, the point cloud in the three-dimensional space can be recovered from the depth image. For example, the image coordinates can be converted into world coordinates by using a multi-view geometric algorithm, and the specific conversion mode and process are not limited.

After the electronic device acquires the depth image by using the ToF, the electronic device may perform a plane detection process on the depth image, which may specifically include several possible processes as follows.

In the first process, the depth image is divided into a plurality of non-overlapping regions (each region is hereinafter referred to as a sub-image, and may also be referred to as a sub-depth image).

If the plane detection method provided by the embodiment of the present invention is regarded as a plane detection algorithm process, the process can be regarded as an initialization process of the plane detection algorithm, in which a frame of depth image is input, and the depth image has image coordinates (u, v) and depth values (gray scale). This process segments the depth image to obtain a plurality of segmented sub-images, and then may perform subsequent processing on each sub-image separately. Because the point cloud can be restored correspondingly according to the depth information, processing and corresponding each sub-image obtained after segmentation can reflect processing the point cloud corresponding to each sub-image in a three-dimensional space, that is, the world coordinates (x, y, z) of the point cloud can be used for calculation in the algorithm, for example, calculation of a fitting plane is performed, so that on one hand, a plurality of planes can be detected as far as possible, and on the other hand, the processing speed of the depth image can be accelerated.

Taking the depth image shown in fig. 3b as an example, the following describes the first procedure in detail with reference to a specific example.

The depth image shown in fig. 3b is segmented to obtain a schematic diagram of the segmented sub-image shown in fig. 3c, and fig. 3c only exemplarily shows the segmented 42 regular-sized non-overlapping regions (hereinafter referred to as sub-images), and optionally, the segmented 42 regions may also be regions with different sizes. It should be understood that the segmentation can be performed according to actual needs in a specific implementation process, and the number of the specific sub-images is not particularly limited.

In a possible implementation manner, the depth image is segmented into a plurality of non-overlapping regions, a region with a regular size can be obtained in the depth image, each region can include a plurality of points, each point can be mapped into a point cloud composed of a plurality of points in a three-dimensional space, a first order moment and a second order moment of the point clouds corresponding to the regions can be calculated in an initialization process, wherein the first order moment is a center of gravity of the point cloud corresponding to the region, and the second order moment is a variance of the point cloud corresponding to the region.

After obtaining 42 sub-images as shown in fig. 3c, there are various implementations of the subsequent processing (such as clustering) for the 42 sub-images. In a possible implementation manner, for the point cloud corresponding to each sub-image, the clustering process in the following process two is performed respectively. In the method, all the sub-images obtained after the depth image is segmented need to be processed, and compared with a scheme that detection is stopped when the to-be-processed image is traversed from the center until one plane is detected and only one plane can be detected in the prior art, the scheme provided by the application segments the whole to-be-processed image to obtain the sub-images and performs clustering processing on all the sub-images, so that a plurality of planes can be detected.

In order to further increase the speed of plane detection, in another possible implementation manner, a sub-image that may be fitted to a plane by a corresponding point cloud may be determined from 42 sub-images, and then the clustering process of the following process two is performed on the determined sub-image that may be fitted to a plane, so that the subsequent process on the sub-image that is less likely to be fitted to a plane is not required, and thus the time of the whole plane detection process may be saved.

In one example, a graph structure may be constructed for the sub-image shown in fig. 3c, where the graph structure is a computer data structure, and the information of the point cloud corresponding to the sub-image is taken as a node of the graph structure, for example, the information of the point cloud corresponding to the sub-image B3 is taken as a node of the graph structure, and the node corresponding to the sub-image B3 in the graph structure includes the first moment, the second moment (also referred to as mean square error), and the normal vector calculated by plane fitting of the point cloud corresponding to the sub-image B3. Then, an edge is constructed between two nodes which are possibly fitted into a plane, and subsequent processing is carried out on the node with the edge.

In the embodiment of the present application, the mean square error is an average of a sum of squares of errors of corresponding points of fitting data and original data, that is, taking an example of obtaining one plane a according to fitting of all points included in the point cloud, and for a certain point in the point cloud, the mean square error of the point cloud is an average of a sum of squares of distances between all points in the point cloud and the plane a. The mean square error of the point cloud is not described in detail hereinafter.

In one possible implementation, whether to construct an edge between two nodes may be selected based on the relationship between the two nodes. Specifically, after the point cloud corresponding to each node is subjected to plane fitting, corresponding normal vectors can be obtained, if an included angle between the normal vectors of the fitting planes corresponding to the two nodes respectively is larger than an included angle threshold value, no edge is constructed between the two nodes, and if the included angle between the normal vectors of the two nodes is smaller than or equal to the included angle threshold value, an edge is constructed between the two nodes. And then, carrying out clustering processing in the second process on the nodes with edges, and not carrying out clustering on the nodes without edges.

Referring to fig. 3d, a diagram structure initialization diagram provided in the embodiment of the present application is shown.

As shown in fig. 3d, a classification is performed on all graph nodes corresponding to the depth map, where a black circle represents a node without a depth value, x is a node with a discontinuous depth value, two nodes have discontinuous edges, and a black solid point represents a node with a smaller Mean Square Error (MSE) fitted to a point cloud plane corresponding to the sub-image. The constructed edge is actually an edge of the graph structure, and connects two adjacent nodes to indicate that there is a relationship between the two nodes, and there is no relationship if there is no connection. Assuming that two nodes to be connected are an a node and a B node, the connecting edge needs to satisfy the following condition: in the condition 1, the depth values of the point clouds corresponding to any one of the node A and the node B are continuous, namely the node A and the node B are not nodes without depth values or nodes with discontinuous depth values; and 2, under the condition that the included angle of the normal vector between the node A and the node B is smaller than the included angle threshold value.

And a second step of clustering the sub-images in the first step.

In the second process, only clustering processing is performed on the nodes with edges (i.e., the dark regions in fig. 3 d) shown in fig. 3d as an example.

In order to further increase the speed of clustering, clustering processing on nodes which are most likely to form a plane can be preferentially performed by establishing a minimum heap data structure (hereinafter referred to as a minimum heap), the minimum heap can be regarded as a process for sequencing processing sequences of the nodes, the node with the smallest second moment can be preferentially processed through the minimum heap, and traversal is started from the node with the smallest second moment.

In one example, a minimum heap is first established, where all nodes in the graph are each classified as a class. And performing plane fitting on each node, wherein the fitting result of each node corresponds to a fitting mean square error, and the nodes are pressed into a minimum heap according to the sequence of the fitting mean square errors from large to small, and the top of the minimum heap (namely, an object popped up each time) is the node corresponding to the minimum mean square error. And then iteration is carried out, the node which has the minimum fitting error at present is popped up in each iteration, the node is fused with a certain node around, the mean square error after the fusion of the two nodes can be obtained, the fused node with the minimum mean square error after the fusion is selected to replace the two nodes before, a graph structure is added, and the minimum heap is added again. And if the mean square error of the fused node is less than or equal to a first threshold, adding the fused node into the minimum heap, and if the mean square error of the fused node is greater than the first threshold, discarding the fused node. The stopping condition of the iteration is that the plane fitting mean square error between every two nodes in the graph is larger than a first threshold value.

The second process is explained in detail below.

S1, performing plane fitting according to the point cloud corresponding to each node, for example, performing plane fitting on the point cloud by using Principal Component Analysis (PCA) algorithm, which has the following main principle: assuming that all points in the point cloud corresponding to the nodes are on a plane, then calculating the characteristic values of the plane, if one of the three characteristic values is smaller than a threshold value, it can be stated that the points included in the point cloud are on a plane, and the characteristic vector corresponding to the characteristic value smaller than the threshold value is the normal vector of the plane. A mean square error, which is the mean of the sum of the squares of the distances between all points in the point cloud corresponding to that node and the fitted plane, may then be calculated from all points fitted to the plane. And one node corresponds to a mean square error, and the nodes are pressed into the minimum heap according to the calculation result in the order from the maximum to the minimum of the mean square error.

And S2, taking out the node corresponding to the minimum mean square error from the minimum pile, fusing the node corresponding to the minimum mean square error and the adjacent node of the node, and calculating the mean square error of the fused node.

The node fusion process is illustrated below by taking 9 nodes marked by the dashed box shown in fig. 3c as an example.

As shown in fig. 3B, the 9 nodes marked by the dashed box are node B3, node B4, node B5, node C3, node C4, node C5, node D3, node D4 and node D5, respectively, assuming that the 9 nodes are nodes with edges, performing plane fitting on the point clouds corresponding to the 9 nodes to obtain mean square errors, assuming that the mean square error corresponding to node C4 is minimum, 8 nodes B3, node B4, node B2, node C3, node C5, node D5 and node D5 are around node C4, fusing node C5 with each of the 8 surrounding nodes to obtain 8 new fused nodes, which are node C4B5, node C4C5, node C4C5, node D5 and node 5, and node D5, and node C5, respectively, and node C5, respectively, and node C5, and node B are respectively, and node B are obtained by corresponding to obtain new nodes, and new nodes, respectively, assuming that the mean square error corresponding to the node C4B4 in the fused new nodes is the smallest among the mean square errors corresponding to the 8 fused new nodes, and the mean square error corresponding to the node C4B4 is smaller than or equal to the first threshold, the node C4B4 is selected to replace the node C4 and the node B4 to join the graph structure, and the node C4B4 is added to the minimum heap. That is, after the fusion process, the 9 nodes marked by the dashed box in fig. 3C become 8 nodes marked by the dashed box in fig. 3e, which are respectively node B3, node B5, node C3, node C4B4, node C5, node D3, node D4 and node D5. The clustering of the 8 nodes in fig. 3e then continues as described above for the fusion process of node C4 with surrounding nodes until a stop condition for the iteration is met. The iteration stop condition is that the mean square error obtained by plane fitting of the point cloud corresponding to the new node obtained by fusion between every two nodes in all the nodes is larger than a first threshold value.

Here, there are at least two possible ways to determine whether to merge node C4 and node B4.

In one possible approach, after determining the mean square errors corresponding to 8 new nodes obtained by fusing the node C4 with 8 surrounding nodes, respectively, the obtained 8 mean square errors are compared with a first threshold, and if there is a node with a mean square error smaller than or equal to the first threshold in the 8 new nodes, it is determined that the node corresponding to the minimum mean square error is added to the minimum heap from the nodes with a mean square error smaller than or equal to the first threshold. If there are no new nodes with mean square error less than or equal to the first threshold in the 8 new nodes, it means that the node C4 and the surrounding nodes cannot be merged, that is, after this merging process, the result is also 9 nodes marked by the dashed box shown in fig. 3C.

In another possible manner, after determining the mean square errors corresponding to 8 new nodes obtained by fusing the node C4 with 8 surrounding nodes respectively, sorting the mean square errors corresponding to the 8 new nodes, determining a new node corresponding to the minimum mean square error among the 8 new nodes, comparing the minimum mean square error with a first threshold, and if the minimum mean square error is less than or equal to the first threshold, adding the new node corresponding to the minimum mean square error to the minimum heap. If the minimum mean square error is greater than the first threshold, it indicates that node C4 and the surrounding nodes cannot be merged, i.e. after this merging process, the result is again 9 nodes marked by the dashed box shown in fig. 3C.

It should be noted that, since the example of processing the nodes with edges is described in this embodiment, if there is no other node with edges around the node C4 with edges, the above-mentioned fusion process is not required for this node C4, and if there are other nodes with edges around the node C4, a node with a new fused node satisfying the mean square error smaller than or equal to the first threshold value can be found around the node C4. Of course, in the embodiment of the present application, the segmented sub-images shown in fig. 3C may be directly clustered, and then the node C4 may not be fused with any surrounding node.

And in the clustering process, sequentially popping up nodes with minimum mean square error from a minimum stack to perform clustering processing, if the nodes with the minimum mean square error have fusible nodes, adding the fused nodes into the minimum stack, and if the nodes with the minimum mean square error do not have fusible nodes, adding the nodes with the minimum mean square error into the minimum stack again, and stopping the clustering process until each node in the minimum stack cannot find the fusible nodes at a certain moment. At this time, the point clouds corresponding to the nodes in the minimum pile are on different planes, that is, any two nodes cannot be fused into one plane, so that each node in the remaining nodes in the minimum pile may correspond to one plane.

In the foregoing embodiment, a single-threaded clustering manner is taken as an example for explanation, that is, after 42 nodes corresponding to a depth image are added to a minimum heap, a node is popped up each time, a node which can be fused with the node is found from nodes around the node, after the fused node is obtained, the fused node is placed in the minimum heap, and then a node with the minimum mean square error is popped up from the minimum heap to perform the above fusion process until the clustering process is finished. In this way, traversal starts from one node, resulting in multiple planes.

It should be understood that, in the embodiments of the present application, a multi-thread clustering process may also be used to detect multiple planes, for example, a three-thread clustering process is used, where 3 nodes with the minimum mean square error are popped from a minimum heap at a time, each popped node corresponds to one thread, and the above-mentioned three-thread fusion process may be referred to for the fusion process of each thread of the three threads. And adding the fused nodes of the three threads into the minimum heap, then continuously popping 3 nodes with the minimum mean square error from the minimum heap, and carrying out the fusion process with the surrounding nodes until the clustering process is finished. Compared with the mode of traversing from one node, the mode has higher processing speed, and thus, the plane detection result of the whole image to be processed can be obtained more quickly.

After the clustering process in the second process is performed on the nodes with edges shown in fig. 3d, the clustering result shown in fig. 3f can be obtained, that is, the gray connected graph in fig. 3f is the rough extraction plane.

In the plurality of crude extraction planes obtained in the second process, there may be a case where two crude extraction planes are actually the same plane, there may be points that do not belong to the crude extraction plane in the crude extraction plane, or there may be other points around the crude extraction plane that belong to the crude extraction plane, that is, some points that belong to the crude extraction plane may be missed by the crude extraction plane. In order to obtain a more accurate plane detection result, the crude extraction plane is optimized, and specifically, the following process three, process four, and process five may be mentioned.

In order to further obtain a precise plane and avoid points which do not belong to the crude extraction plane from existing in the crude extraction plane, a corrosion algorithm operation can be performed on the crude extraction plane, and a specific implementation manner refers to process three.

And a third step of performing edge removal operation, which may also be referred to as corrosion algorithm operation, on the crude extraction plane obtained by clustering in the second step.

The edge of the crude extraction plane obtained in process two is saw-toothed in shape and, depending on the accuracy of the graph structure, will cover different regions and possibly intervene in the regions of another plane. The etching operation may shrink the dominant regions of each node in the current map and leave the most central regions of each plane.

Based on fig. 3f, the crude extraction planes are marked to obtain the image marked with each crude extraction plane as shown in fig. 3g, and it should be understood that the action of marking the crude extraction planes is not a necessary action in the process of detecting planes, but is only for the convenience of illustrating the crude extraction planes obtained by the second process.

As fig. 3g exemplarily shows the crude extraction plane a, the crude extraction plane B, the crude extraction plane C, the crude extraction plane D, the crude extraction plane E, the crude extraction plane F marked by dashed boxes.

Taking the rough extraction plane a marked by the dashed box in fig. 3g as an example, the rough extraction plane a is etched to remove a circle of nodes at the sawtooth edge of the rough extraction plane a, so that the obtained plane is more accurate, and the rough extraction plane a can be prevented from including points which do not belong to the rough extraction plane as much as possible.

For the rough extraction plane obtained in the second step, the etching operation in the third step can make the rough extraction plane shrink to the central position and remove the edge for one circle, for example, compare fig. 3a and 3g, the rough extraction plane a in fig. 3g is the table top in fig. 3a, in fig. 3g, if the depth value of the table corner is not processed by the erosion algorithm operation in the third process, that is, if the region growing operation in the fourth process is directly performed after the second process, it is easy to cause the region growing of the rough extraction plane a together with the following rough extraction plane C, rough extraction plane D, etc., but in practice the rough extraction plane a is not the same plane as the following rough extraction plane C and rough extraction plane D, this can be avoided after etching of the crude extraction plane, which can give erroneous results if the region growing is carried out together.

In order to further obtain a refined plane and avoid that some points belonging to the coarse extraction plane may be missed by each coarse extraction plane, taking the coarse extraction plane a as an example, a region growing algorithm operation may be performed on the coarse extraction plane a, and a specific implementation manner refers to process four.

And fourthly, further adopting a region growing algorithm to operate the crude extracting plane subjected to the corrosion operation so as to avoid missing points belonging to the crude extracting plane.

In one possible implementation, the multiple rough extraction planes may be grown outward simultaneously with the central region being the one.

How to realize the region growing is described in detail below with reference to specific examples.

Taking the rough extraction plane a subjected to the etching operation as an example to realize the outward growth with the center region, since the rough extraction plane a subjected to the etching operation is obtained by performing the edge removal operation with respect to the rough extraction plane a, the rough extraction plane a subjected to the etching operation is not the rough extraction plane a, and the rough extraction plane a subjected to the etching operation will be described below as an example of referring to the rough extraction plane a' as an example.

In the process of growing outward with the coarse extraction plane a ' as the central region, it is necessary to determine whether each pixel point (which is not in the coarse extraction plane a ') around the coarse extraction plane a ' can be divided into the coarse extraction plane a ', and for convenience of description, the pixel point that is determining whether the pixel point can be divided into the coarse extraction plane a ' is referred to as an operation point.

In the embodiment of the present application, there are various implementation manners that can determine whether an operation point can be classified into the crude extraction plane a'.

In a possible implementation manner, if more than 50% of all neighboring pixel points around the operation point belong to a coarse extraction plane, such as the coarse extraction plane a ', the operation point is also divided into the coarse extraction plane a'.

Taking an operation point as a pixel point K as an example, if 8 adjacent pixel points are around the pixel point K, if more than 4 pixel points exist in the 8 adjacent pixel points and belong to a rough extraction plane a ', for example, 5 adjacent pixel points around the pixel point K belong to a rough extraction plane a', point cloud data corresponding to the pixel point K is substituted into a plane fitting equation corresponding to the rough extraction plane a 'so as to verify whether the pixel point K belongs to the rough extraction plane a', and if the verification result is that the pixel point K belongs to the rough extraction plane a ', the pixel point K is divided into the rough extraction plane a'; if the verification result is that the pixel point K does not belong to the rough extraction plane A ', the division of the pixel point K to the rough extraction plane A' is abandoned.

In another possible implementation manner, if 50% of neighboring pixel points in all neighboring pixel points around the operation point belong to the rough extraction plane a ', and 50% of neighboring pixel points belong to the rough extraction plane C', the operation point may be respectively plane-fitted with neighboring pixel points belonging to two rough extraction planes, and the operation point K is selected to be divided into the rough extraction planes to which the neighboring pixel points corresponding to the fitting results and having smaller mean square error belong.

Taking an operation point as a pixel point K as an example, if 8 adjacent pixel points are arranged around the pixel point K, if 4 pixel points belong to a rough extraction plane A ', 4 pixel points are close to a rough extraction plane C', then the pixel point K is respectively substituted into a plane fitting equation of the rough extraction plane A 'and a plane fitting equation of the rough extraction plane C', substitution results of the two plane fitting equations are compared, and then the pixel point K is divided into two rough extraction planes (A 'and C') with a better result in the substitution results.

Through the fourth process, the adjacent points which are around each coarse extraction plane and belong to the same plane with a certain coarse extraction plane obtained in the third process can be divided into the coarse extraction planes, so that the detected planes are more complete.

The planes detected by any one of the second, third and fourth processes are obtained based on the understanding of the two-dimensional image, and may have a visual error, for example, two planes may actually be the same plane, but may not be one plane reflected on the two-dimensional image. To avoid this, the following process five can be used to eliminate the possible visual errors of the multiple planes detected by the previous processes.

And a fifth step of performing a merged plane operation on the multiple planes, which may also be referred to as a merged plane algorithm.

The fifth process may be applied to perform the merging plane operation on the result obtained in any one of the second process, the third process, and the fourth process, and the merging plane operation on the result obtained in the fourth process is described as an example below.

And aiming at the plurality of planes obtained in the fourth process, a bottom-up hierarchical clustering algorithm is used for one time, all the planes obtained in the fourth process can be traversed once, and any two planes with similar spatial distance in space are combined.

In a possible implementation manner, two planes in the multiple planes obtained by the fourth process of the PCA algorithm may be subjected to plane fitting, for example, the multiple planes obtained by the fourth process are planes a ', B', C ', D', E ', F', after the two-by-two plane fitting, a mean square error of a plane B ', a plane BC obtained by fitting the plane C', and a plane DE obtained by fitting the plane D 'and the plane E' is smaller than a threshold value, so that the plane B 'and the plane C' are on the same plane, the plane D 'and the plane E' are on the same plane, then, the plane BC and the plane DE can be continuously fitted with the remaining planes a 'and F', and the fitting between every two planes is continuously performed, and the mean square error of the fitted plane between any two planes is known to be larger than the threshold, and the fitting is finished, and at this time, all the remaining planes are not on one plane.

In one implementation, it may be determined whether any two planes are parallel, and then whether the spatial distances of the two parallel planes are close to each other in space.

In some embodiments, normal vectors for two planes may be calculated, and the determination of whether the two planes are parallel may be made by determining whether the normal vectors are parallel, and if the normal vectors of the two planes are parallel, then the two planes are also parallel, illustratively, plane a corresponds to normal vector n1 ═ a1, B1, C1, plane a corresponds to normal vector n2 ═ a2, B2, C2, and if a1/a2 ═ B1/B2, and B1/B2 ═ C1/C2, and a1/a2, B1/B2, C1/C2 are all the same constant, then plane a and plane B are parallel, where "/" means divide, and a1/a2 means a1 divided by a 2.

In implementation, if it is determined that two planes are not parallel, the two planes are not determined to be the same plane, or the two planes are not determined to be planes close in spatial distance; if two planes are determined to be parallel, the direction of the common perpendicular (i.e. the direction of the normal vector) of the two planes can be determined by determining that there is a perpendicular distance between the two planes, and if the distance of the 2 planes in the direction of the common perpendicular is large, the variance is large; if the distance between the two planes in the direction of the common perpendicular line is small, the variance is small, so that the two planes can be fitted through a PCA algorithm, the mean square error is calculated, then the variance is obtained through decentralization, and then whether the distances between the two planes are close or not is determined through determining the variance, for example, the variance is smaller than or equal to a variance threshold value, which indicates that the distances between the two planes are close, or the two planes are on the same plane.

In one example, performing a merge plane operation as in procedure five based on the respective planes in fig. 3g may result in a merged plane as shown in fig. 3 h. Wherein the crude extraction plane D and the crude extraction plane E in the figure 3g are combined into the same plane, the crude extraction plane B and the crude extraction plane C are combined into the same plane, wherein the crude extraction plane A is a plane, and the crude extraction plane F is a plane.

Through the five processes, the whole depth image can be quickly traversed, so that a plurality of accurate planes can be quickly detected.

It should be noted that the above five processes can be used individually, or one or more processes can be selected and used in combination. For example, only four processes, i.e., a process one, a process two, a process four, and a process five, are adopted, and a process three is omitted. Of course, the above-mentioned plane detection method can be realized by other combinations. For example, the third process may be performed after the fourth process or the fifth process.

In the following embodiments, the plane detection method may include a plurality of processes, and there may be a case where nested execution exists between any two processes, for example, process a may be performed in a nested manner in process one.

Example two

In this embodiment of the application, the image data to be processed in step 201 may be three-dimensional point cloud data. The method and the device for processing the point cloud can be suitable for ordered point clouds and can also be suitable for unordered point clouds. All the points in the ordered point cloud are arranged in sequence, so that the adjacent point information of each point can be easily found, the point arrangement in the disordered point cloud is irregular and disordered, and the adjacent point information of each point cannot be generally found.

Taking the scene shown in fig. 3a as an example, an octree structure is established for the point clouds corresponding to the scene shown in fig. 3a, the point clouds in 8 nodes in one layer are respectively subjected to plane fitting by combining a plane fitting manner, then whether to continue segmenting the point clouds in the nodes in the layer is determined according to whether all the points in each node are in the same plane, and finally a plurality of refined planes are obtained through detection.

For the depth information extraction performed on the scene shown in fig. 3a, the depth information may be obtained by using a ToF camera, structured light, laser scanning, and the like, so as to obtain a depth image, and related content of the depth image may refer to related description in the first embodiment, which is not described herein again.

In the first process, an octree index of the point cloud is constructed to realize the segmentation of the three-dimensional point cloud.

In one possible implementation, the point clouds in a scene are de-centered and segmented into coarse to fine sets according to different resolutions. The resolution here is determined by the result of a plane fit of the point cloud comprised by the voxels corresponding to the octree nodes, where each octree node represents a set of points in a voxel. In the process of constructing the octree, the first moment and the second moment of the point cloud in the voxel corresponding to each node are calculated while each node is constructed.

How to construct the octree structure is described in detail below with reference to FIGS. 4a and 4 b.

First, an octree structure is described, which is a data model, and the octree structure divides a geometric entity of a three-dimensional space into voxels, each voxel having the same time and space complexity, and divides a geometric object of the three-dimensional space having a size of (2n × 2n) by a cyclic recursive division method, thereby forming a directional diagram having a root node. In the octree structure, if the divided voxels have the same attribute, for example, the same attribute can be that the points in the voxels are on the same plane, the voxels constitute a leaf node, otherwise, the voxels are further divided into 8 subcubes, and the division is sequentially performed, and the division is performed at most for the spatial objects with the size of (2n × 2 n).

Fig. 4a is a schematic diagram of a subdivision of a three-dimensional space provided in an embodiment of the present application. As shown in fig. 4a, the geometric object in the three-dimensional space is subdivided, and the subdivision result can be reflected in the octree structure shown in fig. 4 b. For example, in FIG. 4a, a cube A in FIG. 4a is divided into eight equally sized subcubes B1, B2, B3 … … B8, reflecting the eight child nodes B1, B2, B3 … … B8 connected for one root node a in FIG. 4B. The subdivision into eight equally sized subcubes C1, C2, C3 … … C8 continues for the neutron cube B3 of fig. 4a, reflecting the connection of eight sub-nodes C1, C2, C3 … … C8 for sub-node B3 in fig. 4B. The subdivision into eight equally sized subcubes, D1, D2, D3 … … D8, continues for the neutron cube B8 of fig. 4a, reflecting the connection of the eight sub-nodes D1, D2, D3 … … D8 for sub-node B3 in fig. 4B.

Each child node in the octree may represent a collection of points in a voxel, which may include one or more points.

The octree structure constructed according to the first process is obtained by performing rough-to-fine segmentation on the point cloud corresponding to the depth image to obtain octree sub-nodes of each level from top to bottom, as shown in fig. 4b, the first level includes sub-node a, and the second level includes sub-nodes b1, b2, and b3 … … b 8. The third hierarchical level includes child nodes c1, c2, c3 … … c8, d1, d2, d3 … … d 8. For each child node in each layer, whether to continue segmenting is whether points included in the point cloud corresponding to the child node are on the same plane, for example, in fig. 4b, the point cloud corresponding to the child node b1 is on the same plane, so that it is not necessary to continue subdivide the child node b1, and points included in voxels corresponding to the child node b3 are not on the same plane, so that it is necessary to continue segmenting the child node b3, and obtain child nodes c1, c2, and c3 … … c8 in the next level. And until the points included in the point cloud corresponding to each child node in the last hierarchy are on the same plane, or the number of the points included in the point cloud is less than the number threshold, and the child nodes in the hierarchy are not continuously segmented.

Whether each subcube in each level needs to be continuously divided can be judged, whether the point cloud corresponding to the node on each level is on one plane can be judged, and if the point included in the subcube is not on one plane, the subcube needs to be continuously divided; if the points included in the subcube are on a plane, then no further segmentation of the subcube may be achieved by the following procedure A. It should be noted that when the number of points included in the subcube is smaller than the number threshold, the subcube is not continuously divided even if the points included in the subcube are not on one plane.

And a process A, quickly extracting the normal vector of each voxel and judging whether the voxel needs to be continuously segmented.

And calculating normal vectors of voxels corresponding to each child node from a thicker level in the octree structure, and then calculating the normal vectors of the voxels corresponding to the eight octree nodes in each level one level downwards according to the sequence from coarse to fine.

In an example, for a voxel corresponding to each octree node, a PCA algorithm may be used to calculate a normal vector of the voxel, such as a voxel corresponding to node b1, where node b1 corresponds to a voxel, the three-dimensional point cloud included in the voxel corresponds to 3 eigenvalues, such as r1, r2, and r3, each eigenvalue corresponds to an eigenvector, that is, the three-dimensional point cloud included in the voxel corresponds to 3 eigenvectors, and then an eigenvalue with the smallest value, such as r2, is determined from the three eigenvalues r1, r2, and r3, and then the eigenvector corresponding to the eigenvalue r2 is the normal vector of the three-dimensional point cloud included in the voxel.

After the normal vector of the voxel corresponding to the node b1 (i.e., the eigenvector corresponding to the minimum eigenvalue) is calculated, a quotient between the eigenvalue r2 and the sum of the three eigenvalues (i.e., the sum of r1, r2, and r 3) is calculated, the quotient is compared with a third threshold, if the quotient is less than or equal to the third threshold, the three-dimensional point cloud included in the voxel corresponding to the node b1 is on the same plane, and the three-dimensional point cloud included in the voxel corresponding to the node b1 can also be considered as a plane. If the quotient value is greater than the third threshold, the three-dimensional point clouds included in the voxel corresponding to the node b1 are not on the same plane, and it can also be considered that the three-dimensional point clouds included in the voxel corresponding to the node b1 are not a plane, and may form multiple planes, in this case, the node b1 needs to be continuously divided into eight sub-nodes of the next level, and the eight sub-nodes of the next level continuously determine whether to continue to be subdivided according to the above manner. For each child node, if a convergence condition is satisfied, subdivision may not be performed, and the convergence condition may be any one of the following conditions: the method comprises the following steps that firstly, the number of points included in a voxel corresponding to a node is smaller than a number threshold; and secondly, the quotient value between the characteristic value corresponding to the normal vector of the voxel corresponding to the node and the sum of other characteristic values is larger than a third threshold value.

Since the minimum eigenvalue (i.e., the mean square error) may actually reflect the fluctuation degree of the point in the normal vector direction, if the minimum eigenvalue is less than or equal to the second threshold, it indicates that the fluctuation degree of the point in the normal vector direction is small, and it may indicate that the node b1 is a plane.

Through the process A, for a hierarchy, normal vectors can be calculated for eight sub-nodes in the hierarchy at the same time, and whether point clouds included in voxels corresponding to each byte point need to be continuously segmented or not is determined, so that the segmentation process of the eight sub-nodes of the hierarchy can be rapidly completed.

After the octree structure is obtained through the process a, the voxels corresponding to the octree nodes in each level include points on one plane, that is, the voxels corresponding to each octree node correspond to one normal vector, and then the voxels having the coplanar relationship are determined from the voxels corresponding to each node, and the voxels having the coplanar relationship are fused, so that a specific way of determining the coplanar relationship can be referred to as the following process three.

And secondly, aiming at whether each layer of nodes in the octree structure correspond to coplanar relations among all voxels or not.

In a possible implementation manner, Hough (Hough) transformation is adopted to calculate whether voxels corresponding to each node in each layer have a coplanar relationship, and the method can be specifically implemented by using a dual relationship between a sampling space and a parameter space of Hough transformation.

The sampling space is composed of the normal vector of the point cloud of the voxel packet corresponding to each octree node obtained in the process A. A normal vector corresponds to a plane that passes through the origin, and thus corresponds to a point in the parameter space. Illustratively, the plane equation corresponding to one voxel in the sampling space is ax + by + cz ═ 1, and it can be regarded as a dual relationship between the sampling point (x, y, z) and the normal vector (a, b, c).

If the normal vector of the plane equation corresponding to a voxel is converted into a point in the parameter space by Hough (Hough) transformation, the points formed in the parameter space by the normal vectors of the voxels belonging to the same plane converge together, and the points together in the parameter space tend to a central point, i.e. other points fluctuate around the central point. As the number of sampling points increases, that is, normal vectors of many voxels are collected, the result in the parameter space has certain statistical properties, that is, as the number of sampling points increases, the sampling points will have the property of approaching a certain center, that is, the closer to the center, the more sampling points are, which is the statistical property of the result in the parameter space. Since the normal vectors corresponding to each voxel having a coplanar relationship are parallel to each other, a set of coplanar voxel objects can be obtained from the statistical properties of the results of the parameter space.

In some embodiments, for the normal vectors of voxels corresponding to child nodes belonging to the same level in the octree structure, the set of voxels corresponding to coplanar planes in the result of the parameter space may be determined by a K-means algorithm, as follows: firstly, acquiring aggregation central points in a parameter space, for example, 3 aggregation central points, then drawing a sphere by taking each aggregation central point as a center and a preset value as a sphere radius, wherein points in the sphere are a set of voxels having a coplanar relationship, and specific numerical values of the preset value are not limited and can be set according to actual needs.

Aiming at the set of voxels with coplanar relation in the voxels of the subnodes of the same level in the octree structure obtained by Hough (Hough) transformation, the plane equation corresponding to the set of voxels with coplanar relation can be obtained.

A plurality of crude extraction planes can be obtained through the Hough (Hough) transformation process, and each crude extraction plane corresponds to a plane equation.

In another possible implementation manner, voxels corresponding to sub-nodes in each layer of the octree structure may be clustered by a clustering processing manner, and with the sub-nodes c1, c2, c3 … … c8, d1, d2, and d3 … … d8 included in the third level in fig. 4b as an example, voxels corresponding to 16 sub-nodes included in the third level are clustered, and the specific clustering process may refer to the clustering processing manner in the first embodiment.

In the following, taking an example of clustering 16 sub-nodes included in the third hierarchical level in the octree structure as an example, the sub-node c1 may be respectively fused with other 15 sub-nodes, taking one sub-node c1 as an example, from among the 16 sub-nodes, and the minimum mean square error corresponding to the fused sub-node obtained by fusing one sub-node with the sub-node c1 is found out from the other 15 sub-nodes except the sub-node c1, for example, the sub-node with the minimum mean square error corresponding to the fused sub-node obtained by fusing the sub-node c1 is found to be c3, and the minimum mean square error is less than or equal to the first threshold, then the point cloud included in the voxel corresponding to the sub-node c1 and the point cloud included in the voxel corresponding to the sub-node c3 may be fitted to form a plane. This fusion process is repeated and iterated for a plurality of times until no child nodes that can be fused two by two can be found. After the fusion result is obtained at the third hierarchical level, the fusion result is fused with each child node (child nodes b1, b2, b4 … … b7) in the second hierarchical level. And then fusing the fusion result of each sub-node of the second level with the node of the first level, and finally obtaining nodes which cannot be fused between every two nodes.

The crude extraction plane obtained in process two is a high probability fitting result, which is not accurate, and if it is a fitting result calculated in a high level voxel of the octree structure, it is very inaccurate. For example, the crude extraction plane obtained in the second process may have outliers, that is, some points are on the crude extraction plane, but are not on the crude extraction plane obtained by Hough transform, and these outliers need to be removed at this time.

And thirdly, based on a random sample consensus (RANSAC) algorithm, refining the plane equation corresponding to the coarse extraction plane obtained in the second step.

In a possible implementation manner, based on a rough fitting result of Hough transformation (i.e., process two), taking a rough extraction plane B obtained by Hough transformation as an example, a group of points is randomly selected to re-fit a plane with points included in the rough extraction plane B, such as a plane X1, where the points are points in the plane fitted by the Hough transformation, and if a mean square error corresponding to the plane X1 is greater than a mean square error corresponding to the rough extraction plane B, the group of points is not an interior point of the rough extraction plane B, and a group of points is re-selected to continue to fit a plane with the rough extraction plane B; if the mean square error corresponding to the plane X1 is less than or equal to the mean square error corresponding to the rough extraction plane B, then the set of points is the local (inlier) points of the rough extraction plane B, and then, based on the plane X1, another set of points is continuously added to the plane X1, and a plane, such as the plane X2, is re-fitted; if the mean square error corresponding to the plane X2 is less than or equal to the mean square error corresponding to the plane X1, then, based on the plane X2, other points are continuously added to the plane X2, and a plane is re-fitted, so that the optimized plane is finally obtained through the iterative processing.

In another possible implementation manner, based on the rough fitting result of Hough transform (i.e., process two), taking a rough extraction plane C as an example, the corresponding equation is a plane equation, and substituting an out-point from the rough extraction plane C into the plane equation to verify whether the point is an in-office point, where the in-office point is a point that should be in the rough extraction plane C; if the point is an in-place point, adding the point into the crude extraction plane C, if the point is not the in-place point, discarding the point, continuously taking other points out of the crude extraction plane C and substituting the other points into the plane equation so as to verify whether the other points are in-place points, and iterating until an optimized plane is finally obtained.

In the multiple planes obtained in the third step, two planes may be actually one plane, so that operations such as merging planes can be performed on the multiple planes obtained in the third step.

EXAMPLE III

In this embodiment of the application, the image data to be processed in step 201 may also be an RGB image.

In one example, left and right images of the same scene may be captured by a binocular camera, where the left image (hereinafter, referred to as "a image") and the right image (hereinafter, referred to as "B image") are both RGB images, and a Stereo Matching algorithm (e.g., binocular Stereo Matching) may be applied to the a image and the B image to obtain a disparity map, so as to obtain a depth image. And then, plane fitting can be carried out on the point cloud according to the depth image, so that the purpose of detecting the plane is achieved.

In order to realize rapid detection of multiple planes, in the embodiment of the application, the left image and the right image shot by the binocular camera are respectively subjected to segmentation processing to obtain segmentation results. For example, a semantic segmentation mode is adopted to obtain a face block corresponding to the target object, and for example, a plurality of face blocks with similar properties are obtained by adopting image pre-segmentation processing.

Taking semantic segmentation processing for the graph A as an example, the segmentation result obtained by performing semantic segmentation on the graph A is as follows: if the segmented image with at least one target object has a set semantic person, a person framed by a labeling box labeled with the person can be obtained, and an area corresponding to the labeling box includes the target object: then, the region corresponding to the annotation frame is called a sub-image, and then at least one sub-image obtained by semantic segmentation processing is subjected to subsequent processing. For example, the multiple sub-images obtained by dividing the image a are subjected to stereo matching with the multiple sub-images obtained by dividing the image B to obtain a disparity map, and then the depth map is determined according to the disparity map.

Taking the image pre-segmentation process for the image A as an example, the segmentation result obtained by segmenting the image A is as follows: a segmented image comprising a plurality of possibly planar regions, also called face blocks, each of which may also be called a sub-image, is then subsequently processed for each sub-image. For example, a plurality of sub-images obtained by dividing the image a, such as a1, a2, and A3, and a plurality of sub-images obtained by dividing the image B, such as B1, B2, and B3, are processed by a stereo matching algorithm to obtain a disparity map, and then a depth map is determined according to the disparity map. The sub-image a1 and the sub-image B1 are obtained by shooting the same position area C in the same scene from different viewpoints, so that a depth map corresponding to the position area C can be finally obtained according to the sub-image a1 and the sub-image B1, and the depth map corresponding to the position area C is called a sub-depth image.

After obtaining the multiple sub-depth images, each sub-depth image can be restored to a point cloud in a three-dimensional space, then, a corresponding point cloud can be screened from the multiple sub-depth images and can be fitted to a planar sub-depth image, then, clustering processing is performed on the screened sub-depth images, and finally, multiple planes are obtained.

The following describes the plane detection method in the third embodiment with reference to a specific example.

Referring to fig. 5, a schematic process diagram of another plane detection method provided in the embodiments of the present application is shown. As shown in fig. 5, the plane detection method includes the steps of:

and step S1, acquiring two RGB images of the left (A picture) and the right (B picture) shot by the binocular camera.

And step S2, performing semantic segmentation or image pre-segmentation processing on the image A and the image B respectively to obtain segmentation results corresponding to the left RGB image and the right RGB image respectively.

The semantic segmentation can be realized in various possible ways, and one possible way is to adopt a full convolution network to realize rough segmentation.

Before the full convolution network is used for rough segmentation, the full convolution network needs to be trained to obtain a full convolution network model which can be used for segmenting the RGB image. In the end-to-end training process, the input data set is an image shot by a binocular camera marked with a plane, pixel-level marking is needed during marking, and a plurality of plane contents are represented by a plurality of marks (labels). The specific training process is as follows S21-S23:

s21, labeling the RGB image captured by the left camera (referred to as a picture a) and the RGB image captured by the right camera (referred to as a picture B) at pixel level. The labeled diagram of fig. 6, where the pixel level indicates the cyclist.

And S22, inputting the labeled A picture and the labeled B picture into the full convolution network respectively.

And S23, training by using a general full convolution network training mode to obtain a group of available parameters of the full convolution network, wherein the available parameters are model structure parameters of the full convolution network, and the trained full convolution network model is obtained.

It should be noted that the training processes S21 to S23 may be performed before the electronic device leaves the factory, that is, the trained full convolution network model is pre-configured in the electronic device before the electronic device leaves the factory, so that when a user needs to detect a plane by using the plane detection method, a processor in the electronic device may directly call the pre-configured full convolution network model to perform segmentation processing on the RGB image to be processed. The full convolution network model may be obtained by performing the training process S21 to S23 after the electronic device is shipped.

When the full convolution network model is needed to be used for segmenting the RGB image to be processed acquired by the binocular camera, the RGB image to be processed is input into the full convolution network model, and the output result is a semantic segmentation image including pixel-level labels.

In another possible segmentation method, a plurality of fast semantic segmentation methods such as a convolutional-convolutional neural network (R-CNN), or a yolo (you only look) or an ssd (single shot detector) are used to segment the RGB image to be processed.

Before the RGB image to be processed is roughly segmented by using the rapid semantic segmentation modes such as the R-CNN, the YOLO, the SSD and the like, an R-CNN model, a YOLO model or an SSD model needs to be trained to obtain the R-CNN model, the YOLO model or the SSD model which can be used for segmenting the RGB image.

In the end-to-end training process, a data set input to an R-CNN model, a YOLO model or an SSD model is an RGB image marked with a plane, and the data set is marked in a bounding box mode during marking, so that an obtained result is a rough result relative to pixel level marking. And then, training the network by using a general VGG-16 or AlexNet mode. The specific training process is as follows S24-S25:

s24, labeling the RGB image captured by the left camera (referred to as a picture a) and the RGB image captured by the right camera (referred to as a picture B) at pixel level. Fig. 7 is a schematic illustration of labeling, in which a bounding box indicates an image area framed by a rectangular frame, and an object framed by the rectangular frame in fig. 7 is a vehicle.

S25, inputting a data set (a plurality of groups of labeled A pictures and labeled B pictures) into any one neural network of R-CNN, YOLO and SSD, and training to obtain a corresponding network model, for example, inputting the data set into R-CNN to train to obtain a corresponding R-CNN model, a YOLO corresponding YOLO model and an SSD corresponding SSD model.

It should be noted that the training processes S24 to S25 may be performed before the electronic device leaves the factory, that is, the trained full convolution network model is pre-configured in the electronic device before the electronic device leaves the factory, so that when the user needs to detect a plane by using the plane detection method, the processor in the electronic device may directly call the pre-configured full convolution network model to perform the segmentation process on the RGB image to be processed. The full convolution network model may be obtained by performing the training process S21 to S23 after the electronic device is shipped.

The regions corresponding to the labeling boxes obtained by the two semantic segmentation methods are segmented by taking a target object as granularity, and a target object may include multiple planes, such as a vehicle, and the corresponding point cloud in the three-dimensional space is not more than one plane, so that the obtained result is not accurate enough. Therefore, the segmentation is continued for the region corresponding to each labeling frame obtained by the two semantic segmentations, and the segmentation granularity can be set according to actual requirements, which is not limited here.

For the segmentation of the region corresponding to each labeling frame, reference may be made to the segmentation method in the first embodiment, and the subsequent processing methods (such as clustering, etching, region growing, etc.) may also refer to the relevant contents in the first embodiment, which is not described herein again.

In another segmentation mode, the image pre-segmentation processing is to perform similar face block segmentation on the RGB image to be processed by using an image pre-segmentation algorithm. The image pre-segmentation algorithm is an algorithm for clustering based on the similarity between pixels to obtain a segmentation result, such as an RGB image shown in fig. 8a, and a segmented image shown in fig. 8b is obtained after the image pre-segmentation algorithm. Compared with the CNN segmentation mode, the segmentation result based on the image pre-segmentation algorithm is finer.

And step S3, obtaining a disparity map from the image A and the image B through a binocular disparity matching algorithm, and converting the obtained disparity map into a depth image, wherein the depth image comprises a plurality of sub-images.

In one example, based on the segmentation result of step S2, a binocular disparity matching algorithm may be applied to the a and B maps to restore the position of the segmented surface block in the three-dimensional space. The basic principle of the binocular parallax matching algorithm is that the position of a current pixel of one camera in the other camera is searched, and the position of the pixel in the space is restored through image coordinates and internal and external parameters of the binocular camera, so that when the binocular parallax calculation is carried out, a certain pixel point can be extracted from a segmentation surface block firstly, epipolar line search is carried out, and when the number of searched points is more and more, a plurality of three-dimensional space points can be obtained. The binocular disparity matching algorithm may include the currently common BM, SGBM.

It should be understood that, since the segmented sub-images are already obtained in step S2, when performing binocular disparity matching, the calculation of binocular disparity may be performed directly on sub-images, which may be planar, based on the pre-segmentation result, and the three-dimensional spatial positions of the sub-images may be directly restored. Therefore, the calculation amount can be reduced, and the plane detection speed can be increased.

When the sub-image is obtained by semantic segmentation, it may be determined whether the segmented target object has a plane according to the label, and then when the disparity map is determined according to step S3, the disparity map may be determined only for the sub-image having the face block, and then the obtained disparity map may be converted into a depth image. Therefore, the processing amount of subsequent clustering can be reduced, and the time of the plane detection process is saved.

In step S4, a sub-image of the corresponding point cloud on the same plane is determined from the sub-images included in the depth image.

Here, each sub-image may be subjected to plane fitting through a PCA algorithm, a mean square error of the fitting plane is calculated, and if the mean square error of the fitting plane is smaller than a first threshold, the point clouds corresponding to the sub-images are on the same plane.

And step S5, clustering the sub-images of the determined corresponding point clouds on the same plane to obtain a plurality of rough extraction planes.

In step S5, the clustering process may refer to the relevant contents of the clustering process in the first embodiment, and will not be described herein again.

And step S6, performing refinement treatment on each crude extraction plane to obtain an optimized plane.

In step S6, the coarse extraction plane obtained in step S5 may be refined by one or more of erosion algorithm, region growing algorithm, and merged plane operation, and the related processes refer to the related contents of the first embodiment.

It should be understood that, in the embodiment of the present application, after the depth image is obtained in step S3, the detection of the multiple planes may also be implemented by using the processing method of the second embodiment, and relevant contents may refer to relevant contents of the second embodiment, which is not described herein again.

Through the three embodiments, on one hand, simultaneous detection of multiple planes in a scene can be achieved, and the image sensor is not limited to a depth camera and the algorithm is not limited to a traditional clustering algorithm. On the other hand, the high-speed execution of the plane detection algorithm can be ensured by pre-dividing the image and then respectively processing the pre-divided image, so that a plurality of planes in the scene are quickly detected.

In some other embodiments, the present application further provides a plane detection method applied to a computing device, and referring to fig. 9, the method includes the following steps:

step 901, acquiring image data to be processed.

Step 902, performing semantic segmentation on image data to be processed to obtain N sub-image data with labeling information, wherein N is an integer greater than 1; the labeling information is used for labeling the target object in the sub-image data.

Step 903, determining Q pieces of sub-image data with planes from the N pieces of sub-image data with labeling information according to the labeling information of each piece of sub-image data; q is an integer greater than 0 and less than or equal to N.

Step 904, determining the point cloud information corresponding to each planar sub-image data in the Q planar sub-image data.

Step 905, determining K crude extraction planes from Q sub-image data with planes according to point cloud information corresponding to each sub-image data with planes in the Q sub-image data with planes; wherein K is an integer greater than or equal to Q.

Step 906, performing optimization processing on the K crude extraction planes to obtain L optimized planes; l is a positive integer not greater than K.

In one possible implementation, the image data to be processed in step 901 includes a first RGB image and a second RGB image captured by a binocular camera.

The above step 902 can be implemented by: performing semantic segmentation on the first RGB image to obtain N first sub-images with labeling information; performing semantic segmentation on the second RGB image to obtain N second sub-images with labeling information; and each first sub-image with the labeling information and each second sub-image with the labeling information and having a position corresponding relation with the first sub-image form sub-image data with the labeling information.

For example, the first RGB image may be the a diagram in the third embodiment, and the second RGB image may be the B diagram in the third embodiment.

The above step 904 can be implemented by: for each of the Q sub-image data having a plane, performing: determining a disparity map according to a first sub-image with labeling information included in the sub-image data with the plane and a second sub-image with the labeling information and having a position corresponding relation with the first sub-image; determining a sub-depth image according to the disparity map; and determining point cloud information corresponding to the sub-image data with the plane according to the sub-depth image.

The above step 905 can be implemented by: and determining K rough extraction planes from the Q sub-depth images according to the point cloud information corresponding to the sub-image data with the planes.

It should be noted that, for specific implementation manners of determining K rough extraction planes, performing clustering processing, and performing optimization on the rough extraction planes, reference may be made to the description of the relevant embodiments in the first embodiment, and details are not described here.

The plane detection method can be applied to the fields of intelligent robot navigation, automatic driving and the like besides the field of augmented reality, so that the obstacle avoidance function of automatic judgment of a computer can be realized.

In other embodiments of the present application, the present application further provides a computing device, as shown in fig. 10, the computing device may include: a processor 1001; a memory 1002; and one or more computer programs 1003, which may be connected by one or more communication buses 1004.

Wherein the one or more computer programs 1003 are stored in the memory 1002 and configured to be executed by the processor 1001, the one or more computer programs 1003 comprising instructions, which may be used, for example, to perform the steps of the respective embodiments of fig. 2, and fig. 5. Specifically, the processor 1001 may be configured to perform steps 201 to 205 in fig. 2, and the processor 1001 is configured to perform steps S1 to S6 in fig. 5.

Through the description of the foregoing embodiments, it will be clear to those skilled in the art that, for convenience and simplicity of description, only the division of the functional modules is illustrated, and in practical applications, the above function distribution may be completed by different functional modules as needed, that is, the internal structure of the apparatus may be divided into different functional modules to complete all or part of the above described functions. For the specific working processes of the system, the apparatus and the unit described above, reference may be made to the corresponding processes in the foregoing method embodiments, and details are not described here again.

Each functional unit in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or make a contribution to the prior art, or all or part of the technical solutions may be implemented in the form of a software product stored in a storage medium and including several instructions for causing a computer device (which may be a personal computer, a first electronic device, or a network device) or a processor to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: flash memory, removable hard drive, read only memory, random access memory, magnetic or optical disk, and the like.

The above description is only a specific implementation of the embodiments of the present application, but the scope of the embodiments of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by a person skilled in the art within the technical scope of the embodiments of the present application shall be covered by the scope of the embodiments of the present application, and therefore the scope of the embodiments of the present application shall be subject to the scope of the claims.

Claims

1. A plane detection method applied to a computing device is characterized by comprising the following steps:

acquiring image data to be processed;

segmenting the image data to be processed to obtain N pieces of sub-image data, wherein N is an integer greater than 1;

determining point cloud information corresponding to at least one sub-image data in the N sub-image data;

performing clustering processing on the point clouds corresponding to the N sub-image data according to the point cloud information corresponding to the at least one sub-image data in the N sub-image data to obtain K crude extraction planes; k is a positive integer not greater than N;

optimizing the K crude extraction planes to obtain L optimized planes; l is a positive integer not greater than K;

when the image data to be processed is a depth image, the depth image comprises image coordinates and depth values of each pixel point; the clustering processing is performed on the point clouds corresponding to the N sub-image data according to the point cloud information corresponding to the at least one sub-image data in the N sub-image data to obtain K crude extraction planes, and the clustering processing comprises the following steps:

establishing a minimum pile data structure according to point cloud information corresponding to the at least one sub-image data in the N sub-image data, and preferentially clustering point clouds corresponding to at least two sub-image data with the minimum mean square error in the minimum pile data structure to obtain K crude extraction planes; the minimum heap data structure is used for sequencing the sub-depth images in the sub-depth image set to be processed according to the mean square error of the point cloud corresponding to each sub-depth image, and the mean square error of the point cloud corresponding to the sub-depth image positioned at the top of the heap is minimum.

2. The method of claim 1, wherein the image data to be processed is a depth image; the depth image comprises image coordinates and depth values of each pixel point;

the segmenting the image data to be processed to obtain N sub-image data includes:

segmenting the depth image to obtain N sub-depth images;

the determining point cloud information corresponding to at least one sub-image data of the N sub-image data includes:

determining point cloud information corresponding to at least one sub-depth image in the N sub-depth images;

the clustering processing is performed on the point clouds corresponding to the N sub-image data according to the point cloud information corresponding to at least one sub-image data in the N sub-image data to obtain K crude extraction planes, and the clustering processing comprises the following steps:

determining the mean square error of a fitted plane of the point cloud corresponding to each sub-depth image according to the point cloud information corresponding to each sub-depth image in the N sub-depth images;

determining sub-depth images meeting a first condition from the N sub-depth images to form a sub-image set to be processed; the first condition comprises that the mean square error of a fitted plane of a point cloud corresponding to the sub-depth image is less than or equal to a first threshold value;

and clustering point clouds corresponding to the sub-depth images in the sub-image set to be processed to obtain K crude extraction planes.

3. The method of claim 1, wherein the image data to be processed is a depth image;

segmenting the depth image to obtain N sub-depth images;

taking the point cloud corresponding to each sub-depth image in the N sub-depth images as a node to construct a graph structure; each node in the graph structure stores point cloud information corresponding to the node;

traversing each node in the graph structure, determining two nodes meeting a second condition in the graph structure, and constructing an edge between the two nodes meeting the second condition; the second condition comprises that the depth values of the point clouds corresponding to any one node of the two nodes are continuous, and the included angle between the normal vectors of the point clouds corresponding to the two nodes is smaller than an included angle threshold value;

determining sub-depth images corresponding to nodes with at least one edge in the graph structure in the N sub-depth images to form a sub-image set to be processed;

4. The method of claim 1, wherein the image data to be processed comprises a first RGB image and a second RGB image taken by a binocular camera;

performing image pre-segmentation on the first RGB image to obtain N first face blocks; performing image pre-segmentation on the second RGB image to obtain N second face blocks; the first face block and the second face block have a position corresponding relation;

for each first face block of the N first face blocks, performing:

determining the second face blocks which have position corresponding relation with the first face blocks from the N second face blocks;

determining a disparity map according to the first face block and the second face block which has a position corresponding relation with the first face block; determining a sub-depth image according to the disparity map;

forming a sub-image set to be processed according to the determined N sub-depth images;

the clustering processing is performed on the point clouds corresponding to the N sub-image data according to the point cloud information corresponding to the at least one sub-image data in the N sub-image data to obtain K crude extraction planes, and the clustering processing comprises the following steps:

and clustering point clouds corresponding to the N sub-depth images in the sub-image set to be processed according to the point cloud information corresponding to the at least one sub-image data in the N sub-image data to obtain K crude extraction planes.

5. The method according to any one of claims 2 to 4, wherein the clustering point clouds corresponding to sub-depth images included in the sub-image set to be processed to obtain K crude extraction planes comprises:

establishing a minimum pile data structure according to point cloud information corresponding to each sub-depth image in the sub-image set to be processed; the minimum heap data structure is used for sequencing the sub-depth images in the sub-depth image set to be processed according to the mean square error of the point cloud corresponding to each sub-depth image, and the mean square error of the point cloud corresponding to the sub-depth image positioned at the top of the heap is minimum;

executing preset operation aiming at the minimum pile data structure until the mean square error of a fitted plane of point clouds corresponding to any two nodes in the minimum pile data structure is larger than a first threshold value, and obtaining K crude extraction planes;

wherein the preset operation comprises: taking out a sub-depth image from the heap top in the minimum heap data structure, and if a sub-depth image meeting a third condition is determined from sub-depth images adjacent to the sub-depth image, fusing the sub-depth image and the sub-depth image meeting the third condition to obtain a fused sub-depth image, wherein the third condition comprises that the mean square error of a point cloud corresponding to the sub-depth image after being fitted with a plane is smaller than a first threshold value and the mean square error is minimum; adding the fused sub-depth image to the minimum heap data structure.

6. The method of claim 1, wherein the image data to be processed is a point cloud included in a three-dimensional space;

taking the three-dimensional space as a node of a first level of an octree structure;

for each child node comprised by a first level and an ith level in the octree structure, performing: if the child node meets the fourth condition, carrying out eight equal division on the child node to obtain eight child nodes of the (i + 1) th level; wherein the fourth condition comprises that the mean square error of the point clouds corresponding to the child nodes is greater than a first threshold; the i is an integer greater than 1, and the ith level comprises 8i child nodes;

until all child nodes included in the last level meet the fifth condition, constructing an octree structure including M levels of child nodes; the fifth condition comprises that the mean square error of the point clouds corresponding to the sub-nodes is not more than the first threshold value, or the point clouds corresponding to the sub-nodes comprise the number of points less than the number threshold value;

determining N unsegmented child nodes in the octree structure;

determining point cloud information corresponding to at least one unsegmented child node of the N unsegmented child nodes;

the clustering processing is performed on the point clouds corresponding to the N sub-image data according to the point cloud information corresponding to the at least one sub-image data in the N sub-image data to obtain K crude extraction planes, and the method comprises the following steps:

and according to the point cloud information corresponding to at least one unsegmented sub-node in the N unsegmented sub-nodes, carrying out clustering processing on the point clouds corresponding to the N unsegmented sub-nodes in the octree structure to obtain K crude extraction planes.

7. The method of claim 6, wherein said clustering point clouds corresponding to said N unsegmented sub-nodes in said octree structure according to point cloud information corresponding to at least one of said N unsegmented sub-nodes to obtain K coarse extraction planes comprises:

determining a normal vector of a point cloud corresponding to each unsegmented sub-node in the N unsegmented sub-nodes according to point cloud information corresponding to at least one unsegmented sub-node in the N unsegmented sub-nodes, and converting the normal vector into a point in a parameter space through Hough;

k point sets formed by normal vectors of the point cloud corresponding to the N unsegmented child nodes in a parameter space are determined, and each point set is provided with an aggregation center;

for each point set, determining points which fall within a preset range around the gathering center of the point set;

and fusing point clouds corresponding to the non-segmented sub-nodes corresponding to the points falling in the preset range into a rough extraction plane.

8. The method of any one of claims 1-4 and 6-7, wherein said optimizing said K crude extraction planes to obtain L optimized back planes comprises:

determining a normal vector of each of the K coarse extraction planes;

traversing any one of the K rough extraction planes, and if a rough extraction plane meeting a sixth condition exists, fusing the rough extraction plane and the rough extraction plane meeting the sixth condition into one plane to obtain L optimized planes;

wherein the sixth condition comprises: and the normal vector is parallel to the normal vector of the rough extraction plane, and the variance after the plane is fitted with the rough extraction plane is smaller than a variance threshold value.

9. A plane detection method applied to a computing device is characterized by comprising the following steps:

acquiring image data to be processed;

performing semantic segmentation on the image data to be processed to obtain N sub-image data with labeling information, wherein N is an integer greater than 1; the labeling information is used for labeling the target object in the sub-image data;

according to the labeling information of each sub-image data, Q sub-image data with planes are determined from the N sub-image data with the labeling information; q is an integer greater than 0 and less than or equal to N;

determining point cloud information corresponding to each sub-image data with the plane in the Q sub-image data with the plane;

determining K crude extraction planes from the Q sub-image data with planes according to the point cloud information corresponding to each sub-image data with planes in the Q sub-image data with planes; k is an integer greater than or equal to Q;

optimizing the K crude extraction planes to obtain L optimized planes; and L is a positive integer not greater than K.

10. The method of claim 9, wherein the image data to be processed comprises a first RGB image and a second RGB image taken by a binocular camera;

the semantic segmentation is performed on the image data to be processed to obtain N sub-image data with labeling information, and the method comprises the following steps:

performing semantic segmentation on the first RGB image to obtain N first sub-images with labeling information; performing semantic segmentation on the second RGB image to obtain N second sub-images with labeling information; each first sub-image with the labeling information and each second sub-image with the labeling information and having a position corresponding relation with the first sub-image form sub-image data with the labeling information;

the determining point cloud information corresponding to each sub-image data with a plane in the Q sub-image data with a plane includes:

for each of the Q sub-image data having a plane, performing:

determining a disparity map according to a first sub-image with labeling information included in the sub-image data with the plane and a second sub-image with the labeling information and having a position corresponding relation with the first sub-image;

determining a sub-depth image according to the disparity map;

according to the sub-depth image, point cloud information corresponding to the sub-image data with the plane is determined;

determining K crude extraction planes from the Q sub-image data with planes according to the point cloud information corresponding to the sub-image data with planes in the Q sub-image data with planes, wherein the method comprises the following steps:

and determining K rough extraction planes from the Q sub-depth images according to the point cloud information corresponding to the sub-image data with the planes.

11. The method of claim 9 or 10, wherein the determining K crude extraction planes from Q sub-depth images according to the point cloud information corresponding to the sub-image data with planes comprises:

taking the point cloud corresponding to each sub-depth image in the Q sub-depth images as a node to construct a graph structure; each node in the graph structure stores point cloud information corresponding to the node;

determining sub-depth images corresponding to nodes with at least one edge in the graph structure in the Q sub-depth images to form a sub-image set to be processed;

12. The method of claim 11, wherein the clustering point clouds corresponding to sub-depth images included in the sub-image set to be processed to obtain K crude extraction planes comprises:

wherein the preset operation comprises: taking out a sub-depth image from the top of the minimum pile data structure, and if a sub-depth image meeting a third condition is determined from sub-depth images adjacent to the sub-depth image, fusing the sub-depth image and the sub-depth image meeting the third condition to obtain a fused sub-depth image; the third condition comprises that the mean square error of the point cloud corresponding to the sub-depth image after being fitted with a plane is smaller than a first threshold value and the mean square error is minimum; adding the fused sub-depth image to the minimum heap data structure.

13. The method of claim 9 or 10, wherein said optimizing said K crude extraction planes to obtain L optimized back planes comprises:

determining a normal vector of each of the K coarse extraction planes;

traversing any one of the K rough extraction planes, and if a rough extraction plane meeting a sixth condition exists, fusing the rough extraction plane and the rough extraction plane meeting the sixth condition into one plane;

14. A computing device comprising at least one processor;

the at least one processor configured to perform the following operations:

acquiring image data to be processed;

15. The computing device of claim 14, wherein the image data to be processed is a depth image; the depth image comprises image coordinates and depth values of each pixel point;

segmenting the depth image to obtain N sub-depth images;

determining sub-depth images meeting a first condition from the N sub-depth images to form a sub-image set to be processed; the first condition comprises that the mean square error of a fitted plane of a point cloud corresponding to the sub-depth image is less than or equal to a first threshold;

16. The computing device of claim 14, wherein the image data to be processed is a depth image;

segmenting the depth image to obtain N sub-depth images;

17. The computing device of claim 14, wherein the image data to be processed comprises a first RGB image and a second RGB image captured by a binocular camera;

for each first face block of the N first face blocks, performing:

18. The computing device of any one of claims 15 to 17, wherein the clustering point clouds corresponding to sub-depth images included in the sub-image set to be processed to obtain K crude extraction planes comprises:

19. The computing device of claim 14, wherein the image data to be processed is a point cloud included in a three-dimensional space;

determining N unsegmented child nodes in the octree structure;

20. The computing device of claim 19, wherein said clustering point clouds corresponding to said N unsegmented sub-nodes in said octree structure according to point cloud information corresponding to at least one of said N unsegmented sub-nodes to obtain K coarse extraction planes comprises:

21. The computing device of any one of claims 14-17 and 19-20, wherein said optimizing said K crude extraction planes to obtain L optimized back planes comprises:

determining a normal vector of each of the K coarse extraction planes;

22. A computing device comprising at least one processor;

the at least one processor configured to perform operations comprising:

acquiring image data to be processed;

23. The computing device of claim 22, wherein the image data to be processed comprises a first RGB image and a second RGB image captured by a binocular camera;

performing semantic segmentation on the image data to be processed to obtain N sub-image data with labeling information, where N is an integer greater than 1, and the method includes:

for each of the Q sub-image data having a plane, performing:

determining a disparity map according to a first sub-image with labeling information included in the sub-image data with the plane and a second sub-image with labeling information and having a position corresponding relation with the first sub-image;

determining a sub-depth image according to the disparity map;

determining point cloud information corresponding to the sub-image data with the plane according to the sub-depth image;

24. The computing device of claim 22 or 23, wherein determining K crude extraction planes from Q sub-depth images from point cloud information corresponding to the sub-image data having planes comprises:

25. The computing device of claim 24, wherein the clustering point clouds corresponding to sub-depth images included in the sub-image set to be processed to obtain K crude extraction planes comprises:

aiming at the minimum pile data structure, executing preset operation until the mean square error of a fitted plane of point clouds corresponding to any two nodes in the minimum pile data structure is larger than a first threshold value, and obtaining K crude extraction planes;

26. The computing device of claim 22 or 23, wherein said optimizing said K crude extraction planes to obtain L optimized back planes comprises:

determining a normal vector of each of the K coarse extraction planes;

27. Circuitry comprising at least one processing circuit configured to perform the method of any of claims 1 to 8 or to perform the method of any of claims 9 to 13.

28. A computer storage medium comprising a computer program which, when run on a computing device, causes the computing device to perform the method of any of claims 1 to 8 or the method of any of claims 9 to 13.