WO2023226654A1

WO2023226654A1 - Target object separation method and apparatus, device, and storage medium

Info

Publication number: WO2023226654A1
Application number: PCT/CN2023/089948
Authority: WO
Inventors: 王孙平
Original assignee: 深圳市其域创新科技有限公司
Priority date: 2022-05-23
Filing date: 2023-04-21
Publication date: 2023-11-30
Also published as: CN114648640A; CN114648640B

Abstract

The present invention relates to the technical field of information processing, in particular to a target object separation method and apparatus, a device, and a storage medium. The target object separation method comprises: acquiring three-dimensional scene data; converting the three-dimensional scene data into an optical image and a depth image of a top view; performing semantic segmentation processing on the optical image to obtain target object masks; performing clustering segmentation processing on the depth image to obtain target object clusters; calculating ratios of intersection areas to union areas between the target object masks and the target object clusters; and determining a target object cluster or a target object mask corresponding to a ratio greater than or equal to a first preset threshold value, and determining a region corresponding to the determined target object cluster or target object mask in the three-dimensional scene data as a separation target object. In this way, the present invention can improve the accuracy and speed of target object separation.

Description

A target object singulation method, device, equipment and storage medium

Technical field

The present invention relates to the field of information processing technology, and specifically to a target object singulation method, device, equipment and storage medium.

Background technique

Real-life 3D is an important part of new infrastructure construction. Buildings or other targets are constructed as independent objects through processing on the 3D geographical scene, so that they can be selected individually. Since the amount of 3D scene information is huge and disordered, the processing of 3D scene information is complex and time-consuming.

Technical solutions

In view of the above problems, the present invention provides a target object singulation method, device, equipment and storage medium, which can improve the accuracy and speed of target object singulation.

According to one aspect of the present invention, a target object singulation method is provided, which includes: acquiring three-dimensional scene data; converting the three-dimensional scene data into an optical image and a depth image from a top view; performing semantic segmentation processing on the optical image to obtain the target object image mask; perform clustering and segmentation processing on the depth image to obtain the target object cluster; calculate the ratio of the intersection area and the union area between the target object mask and the target object cluster; determine the ratio that is greater than or equal to the first preset threshold The corresponding target cluster or target mask determines the area corresponding to the determined target cluster or target mask in the three-dimensional scene data as a single target.

Compared with the method of directly singulating three-dimensional point cloud data, the method provided by the present invention processes a smaller amount of data and has a faster processing speed. This invention uses the depth information of point cloud data to fuse the masks after semantic segmentation of optical images through depth image clustering and segmentation, so that the segmentation area is complete and the edges of the final single target objects are accurate, effectively solving the problem of segmentation areas. Incomplete or beyond the scope of the real target object. For the single unitization of buildings, the present invention can make full use of the large distance between the top of the building and the ground and the prior information that the roof is close to the plane, accurately cluster and segment the depth image to obtain the target object cluster, and combine the The target mask obtained by semantic segmentation of the optical image achieves precise and efficient filtering of target clusters by calculating the ratio of the intersection area and the union area between the target mask and the target cluster, and can be adapted to various sizes and shapes. The geographical area of architectural style, and the data set required by the single method provided by the present invention is easy to obtain and label.

In an optional method, the number of target masks and target clusters is multiple, and there is a one-to-one correspondence between at least part of the target masks and at least part of the target clusters; calculating the target masks The ratio of the intersection area and the union area between the target cluster and the target cluster includes: separately calculating the ratio of the intersection area and the union area between each target cluster and its corresponding target mask; determining whether it is greater than or equal to the first predetermined value. Set the target cluster or target mask corresponding to the ratio of the threshold, and determine the area corresponding to the determined target cluster or target mask in the three-dimensional scene data as a single target, including: determining each target cluster or target mask that is larger than Or the target cluster or target mask corresponding to the ratio equal to the first preset threshold, and the area corresponding to each determined target cluster or target mask in the three-dimensional scene data is determined as a single unit Target. By separately calculating the ratio of the intersection area and the union area between each target cluster and its corresponding target mask, and determining the target cluster or target mask corresponding to each ratio greater than or equal to the first preset threshold. Membrane, so that when there are multiple targets that actually need to be singulated in the three-dimensional point cloud or three-dimensional mesh model, the computing device can singulate each target separately.

In an optional manner, after converting the three-dimensional scene data into an optical image and a depth image from a top view, the method further includes: performing interpolation processing and/or filtering processing on the optical image and the depth image. By interpolating and/or filtering the hole points in the optical image or depth image, noise can be effectively reduced, which is beneficial to subsequent segmentation of the optical image or depth image.

In an optional method, semantic segmentation processing is performed on the optical image to obtain the target mask, including: inputting the optical image into a convolutional neural network; outputting the classification confidence of each pixel in the optical image through the convolutional neural network ; Obtain the pixels corresponding to the classification confidence greater than or equal to the second preset threshold, and obtain the target mask. The convolutional neural network is used to determine the classification confidence of each pixel value in the optical image, and by obtaining the points corresponding to the confidence greater than or equal to the second preset threshold, the target mask is obtained, and the target mask is accurately realized in the optical image. Segmentation of the target area.

In an optional way, clustering and segmentation processing is performed on the depth image to obtain target object clusters, including: cluster labeling: dividing one pixel in the depth image into a seed area and marking it as a cluster; initial neighborhood pixels Classification: Calculate the absolute value of the difference between the depth value of a pixel and the depth values of its four adjacent pixels above, below, left, and right, corresponding to the absolute value of the difference that is less than or equal to the third preset threshold The adjacent pixels are divided into seed areas and marked as clusters; repeated neighborhood pixel classification: perform initial neighborhood pixel classification on other pixels in the seed area except pixels, until all pixels on the inner edge of the seed area are The absolute value of the difference between the depth value of the point and the depth value of the pixel adjacent to it and located outside the seed area stops after both are greater than the third preset threshold, and all pixels in the seed area are marked as the same cluster; Traverse the remaining pixels: Repeat cluster labeling, initial neighborhood pixel classification, and repeated neighborhood pixel classification for pixels in the depth image that are not marked as the cluster, until all pixels in the depth image are classified. Stop after marking and obtain the initial target cluster. The depth image is segmented through a clustering algorithm based on region growing, and the areas corresponding to the target and the areas corresponding to the background are marked as clusters to obtain the initial target cluster and achieve accurate division of target areas and non-target areas.

In an optional method, pixels at the outer edge of the seed area are prioritized for repeated neighborhood pixel classification. By giving priority to the pixels at the outer edge of the seed area and repeatedly classifying neighborhood pixels, the process of marking the pixels in the depth image into clusters is centered on the first marked pixel and gradually spreads outward. , which is conducive to improving the efficiency of depth image clustering.

In an optional manner, after repeating the neighborhood pixel classification, the method further includes: cluster filtering: obtaining initial target object clusters with an area greater than or equal to the fourth preset threshold to obtain the target object cluster. When the target object is a building, since the roof of the building is roughly flat and has a certain area, the area where the building is located will be divided into a larger cluster, while the background area outside the building has varying depth values. are uneven, so they will be divided into multiple small-area clusters. Based on this, by obtaining the initial target clusters with an area greater than or equal to the fourth preset threshold, the cluster corresponding to the area where the building is located can be found more accurately, and filtered out The cluster corresponding to the background area.

According to another aspect of the present invention, a target object singulation device is also provided, including: an acquisition unit for acquiring three-dimensional scene data; an image conversion unit for converting the three-dimensional scene data into an optical image and depth of a bird's-eye view. image; the first image segmentation unit is used to perform semantic segmentation processing on the optical image to obtain the target object mask; the second image segmentation unit is used to perform clustering segmentation processing on the depth image to obtain the target object cluster; the computing unit is used Calculate the ratio of the intersection area and the union area between the target mask and the target cluster; the determination unit is used to determine the target cluster or target mask corresponding to the ratio greater than or equal to the first preset threshold, and The area corresponding to the determined target cluster or target mask in the three-dimensional scene data is determined as a single target object.

According to another aspect of the present invention, a computing device is also provided, including: a processor, a memory, a communication interface, and a communication bus. The processor, the memory, and the communication interface complete communication with each other through the communication bus; and the memory is used to store at least one The executable instructions cause the processor to execute the target object singulation method in any of the above ways.

According to another aspect of the present invention, a computer storage medium is also provided. At least one executable instruction is stored in the storage medium. The executable instruction causes the processor to execute the target object singulation method in any of the above ways.

The above description is only an overview of the technical solution of the present invention. In order to have a clearer understanding of the technical means of the present invention, it can be implemented according to the content of the description, and in order to make the above and other objects, features and advantages of the present invention more obvious and understandable. , the specific embodiments of the present invention are listed below.

Description of the drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are for the purpose of illustrating preferred embodiments only and are not to be construed as limiting the invention. Also throughout the drawings, the same reference characters are used to designate the same components. In the attached picture:

Figure 1 is a flow chart of a target monomerization method provided by an embodiment of the present invention;

Figure 2 is a flow chart of the sub-steps of step S130 in Figure 1;

Figure 3 is a flow chart of the sub-steps of step S140 in Figure 2;

Figure 4 is a schematic structural diagram of a target monomerization device provided by an embodiment of the present invention;

Figure 5 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.

Embodiments of the invention

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein.

Real-life 3D is an important part of the country's new infrastructure construction. In the 3D geographical scene, through cutting, reconstruction, vector superposition and other processes, buildings or other targets can be constructed into independent objects in three-dimensional form, making the constructed independent objects It can be selected individually, and by assigning specific attributes to it, the attribute information of the target object can be quickly selected and queried to achieve refined and dynamic management.

Generally, the target object is singled out by processing the three-dimensional point cloud data. Since the three-dimensional point cloud data retains the original geometric information in the three-dimensional space, the amount of information is huge and disorderly. The processing of the three-dimensional point cloud data will It is very complex, time-consuming, and difficult to label, resulting in low efficiency.

Based on this, the present invention proposes a target object singulation method. First, the three-dimensional point cloud or three-dimensional grid model is converted into an optical image and a depth image from a top view to reduce the amount of information that needs to be processed. Secondly, the optical image and depth The image is processed to obtain the target mask and the target cluster respectively, and then by calculating the ratio of the intersection area and the union area between the target mask and the target cluster, the corresponding area in the two-dimensional image may not be the target object. Target clusters and target masks (that is, target clusters corresponding to ratios smaller than the first preset threshold) are excluded to ensure the accuracy of the data. Finally, the remaining target clusters are represented in the 3D point cloud or 3D grid model. The corresponding area in is determined as a single target object to achieve rapid single processing of the target object.

According to one aspect of the embodiment of the present invention, a method for singulating a target object is provided. Please refer to FIG. 1 for details. The figure shows the flow of a method for singulating a target object according to an embodiment of the present invention. The method requires Computing devices that perform single-target execution, such as mobile phones, computers, servers, etc. As shown in Figure 1, the method includes:

S110: Obtain three-dimensional scene data.

In this step, the 3D scene data includes 3D point clouds, 3D mesh models, etc. The 3D scene data can be collected through 3D imaging sensors, such as binocular cameras, RGB‑D cameras, etc., or through 3D imaging sensors and 3D laser scanning. It collects data in combination with a sensor or lidar, and transmits data to the computing device so that the computing device can obtain three-dimensional scene data.

For urban scene data, three-dimensional scene data can be formed through drones, satellite photography or generated through oblique photogrammetry systems.

S120: Convert three-dimensional scene data into optical images and depth images from a top-down perspective.

In this step, the optical image can be a grayscale image or a color image. The computing device can convert the three-dimensional scene data by obtaining the x-axis, y-axis coordinate values and pixel values of the highest point directly above each position in the three-dimensional scene data. into an optical image from a top-down perspective; the computing device can convert the three-dimensional scene data into a depth image from a top-down perspective by obtaining the x-axis, y-axis coordinate values and the depth value of the z-axis of the highest point directly above each position in the three-dimensional scene data.

S130: Perform semantic segmentation processing on the optical image to obtain the target object mask.

For the method of singulating buildings in a three-dimensional geographical scene, the target object mask can be an area in the optical image corresponding to the building in the three-dimensional scene data. If it is to singulate the objects in other three-dimensional scenes, , the target mask can also be, for example, people, animals, plants, etc.

S140: Perform clustering and segmentation processing on the depth image to obtain target object clusters.

Similarly, for the method of singulating buildings in a three-dimensional geographical scene, in this step, the target object cluster may be an area in the depth image corresponding to the building in the three-dimensional scene data.

It should be noted that there is no order in which the above steps S130 and S140 are performed. Step S130 may be performed first, or step S140 may be performed first.

S150: Calculate the ratio of the intersection area and the union area between the target mask and the target cluster.

The intersection area between the target mask and the target cluster refers to the area of the intersection area between the target mask and the target cluster when the edges of the optical image and the depth image are aligned and overlapped. The union area between the target mask and the target cluster refers to the size of the union area between the target mask and the target cluster when the edges of the optical image and the depth image are aligned and overlapped. It can be understood that the greater the ratio of the intersection area to the union area between the target mask and the target cluster, it means that the area corresponding to the target cluster or the target mask in the three-dimensional scene data is actually required. The more likely the area to be monomerized.

S160: Determine the target cluster or target mask corresponding to the ratio greater than or equal to the first preset threshold, and determine the area corresponding to the determined target cluster or target mask in the three-dimensional scene data as a single unit Target.

The target cluster or target mask is an area in an optical image or a depth image from a bird's-eye view. Determining the area corresponding to the target cluster or target mask in the three-dimensional scene data as a single target object means Correspond to the area where the target object is clustered in the depth image or the area where the target object is masked in the optical image to the three-dimensional scene data, and the entire orthographic projection direction corresponding to the area in the three-dimensional scene data (i.e., the z-axis direction) The data was determined to be the monomerized target.

In this step, the first preset threshold can be set according to the actual situation of the target object that needs to be singulated. For example, when the target object that actually needs to be singulated is a building, the first preset threshold can be set on the computing device. A preset threshold is set to 0.6, thereby eliminating target clusters and target masks in areas corresponding to buildings whose intersection area and union area are less than 0.6 and may not be actually required to be singled, and retain them. Target clusters and target masks whose intersection area and union area ratio are greater than or equal to 0.6, and the area corresponding to the target cluster or target mask in the three-dimensional scene data is determined as a single building.

It can be understood that when the difference in pixel values or depth values between the area of the target that actually needs to be singulated and other areas in the optical image or depth image is low, the obtained target mask and target cluster The corresponding relationship with the area where the target object that actually needs to be singulated may be poor. In this case, the value of the first preset threshold can be set larger to filter out the area that may be related to the target object that actually needs to be singulated. Uncorresponding target masks and target clusters make the final single target more accurate.

Considering that the corresponding relationship between the obtained target cluster and the area of the target that actually needs to be singulated is better than that of the target mask, and the edges of the target cluster are smoother than that of the target mask, therefore Preferably, the area corresponding to the target object cluster in the three-dimensional scene data is determined as a single target object.

In the target object singulation method provided by the present invention, by converting the acquired three-dimensional scene data into a two-dimensional top-view optical image and a depth image, the amount of information that needs to be processed can be effectively reduced. By segmenting the optical image and the depth image The process obtains the target mask and the target cluster respectively. By calculating the ratio of the intersection area and the union area between the target mask and the target cluster, the corresponding area in the two-dimensional image may not be the target cluster. and target masks (that is, target clusters and target masks corresponding to ratios smaller than the first preset threshold) are excluded to improve the accuracy of the data, and finally the remaining target clusters or target masks are placed in the three-dimensional scene The corresponding area in the data is determined as a single target object, realizing rapid single processing of the target object.

Compared with the method of directly singulating three-dimensional point cloud data, the method provided by the present invention processes a smaller amount of data and has a faster processing speed. This invention uses the depth information of point cloud data to fuse the masks after semantic segmentation of optical images through depth image clustering and segmentation, so that the segmentation area is complete and the edges of the final single target objects are accurate, effectively solving the problem of segmentation areas. Incomplete or beyond the scope of the real target object. For the single unitization of buildings, the present invention can make full use of the large distance between the top of the building and the ground and the prior information that the roof is close to the plane, accurately cluster and segment the depth image to obtain the target object cluster, and combine the The target mask obtained by semantic segmentation of the optical image can accurately and efficiently filter the target mask and target cluster by calculating the ratio of the intersection area and the union area between the target mask and the target cluster, which can be adapted to the needs of Geographic areas of multiple sizes and different architectural styles, and the data sets required by the single method provided by the present invention are easy to obtain and label.

Considering that in many cases, there may be more than one target object in the three-dimensional scene data, based on this situation, the present invention further proposes an implementation. Specifically, the number of target object masks and target object clusters is multiple, And there is a one-to-one correspondence between at least part of the target mask and at least part of the target cluster. The correspondence relationship refers to the positional correspondence between the target mask and the target cluster, such as the correspondence between the (x, y) coordinate positions in the image.

The above step S150 includes:

The ratio of the intersection area and the union area between each target cluster and its corresponding target mask is calculated separately.

In this step, if a certain target cluster does not have a corresponding target mask, you can choose not to perform calculations on this target cluster. Of course, you can also continue to perform calculations on this target cluster, but the resulting ratio is 0.

It can be understood that this step can also be replaced by calculating the ratio of the intersection area and the union area between each target mask and its corresponding target cluster. In the same way, if a certain target mask does not correspond to it For target clusters, you can choose not to calculate the target mask. Of course, you can also continue to calculate the target mask, but the resulting ratio is 0.

The above step S160 includes:

Determine the target cluster or target mask corresponding to each ratio greater than or equal to the first preset threshold, and determine the area corresponding to each determined target cluster or target mask in the three-dimensional scene data as A single target.

By separately calculating the ratio of the intersection area and the union area between each target cluster and its corresponding target mask, and determining the target cluster or target mask corresponding to each ratio greater than or equal to the first preset threshold. Membrane, so that when there are multiple targets that actually need to be singulated in the three-dimensional point cloud or three-dimensional mesh model, the computing device can singulate each target separately.

In order to reduce noise, the present invention further proposes an implementation. Specifically, after the above step S120, the method further includes:

Interpolate and/or filter optical and depth images.

Among them, the interpolation process can use bilinear interpolation method. The specific process is as follows:

The coordinate values of the known points Q ₁₁ =(x ₁ ,y ₁ ), Q ₁₂ =(x ₂ ,y ₂ ), Q ₂₁ =(x ₂ ,y ₁ ), Q ₂₂ =(x ₂ ,y ₂ ), If you want to obtain the coordinate value of point P=(x,y), first linearly interpolate in the x direction:

Then linearly interpolate in the y direction to get:

Thus the coordinate value of P is determined.

The filtering process can use median filtering operation. The specific process is as follows:

First, for a certain pixel in the optical image or depth image, take the pixel as the center, take a square area with width L as the window, and then calculate the pixel values (for optical images) or depth values of all pixels in the window. (For depth images) Sort, calculate the median of the pixel values or depth values of all pixels in the window, and replace the pixel value or depth value of the pixel with the median value. It should be noted that when the optical image is a color image, the pixel value is an RGB value. When the optical image is a grayscale image, the pixel value is a grayscale value. The same applies to the pixel values mentioned below.

By interpolating and/or filtering the hole points in the optical image or depth image, noise can be effectively reduced, which is beneficial to subsequent segmentation of the optical image or depth image.

For the above step S130, the present invention further proposes a specific implementation. Please refer to Figure 2, which shows the sub-step process of step S130. As shown in the figure, step S130 includes:

S131: Input the optical image into the convolutional neural network.

In this step, the optical image is input into the trained convolutional neural network.

S132: Output the classification confidence of each pixel in the optical image through the convolutional neural network.

In this step, the convolutional neural network calculates the softmax function for the output of the last layer and outputs the classification confidence of the classification result of each pixel in the optical image.

S133: Obtain the pixels corresponding to the classification confidence greater than or equal to the second preset threshold, and obtain the target object mask.

In this step, the classification confidence of each pixel is compared with the second preset threshold, and points greater than or equal to the second preset threshold are retained to obtain the target mask of the corresponding area. For the building singulation method, the second preset threshold can be set to 0.3 to ensure the accuracy of the obtained target object mask.

The convolutional neural network is used to determine the classification confidence of each pixel value in the optical image, and by obtaining the points corresponding to the confidence greater than or equal to the second preset threshold, the target mask is obtained, and the target mask is accurately realized in the optical image. Segmentation of the target area.

For the above step S140, the present invention further proposes a specific implementation. Please refer to Figure 3. The figure shows the sub-step process of step S140. As shown in the figure, step S140 can adopt a clustering algorithm based on region growing. Specifically include:

Cluster marking S141: Divide one pixel in the depth image into a seed area and mark it as a cluster.

In this step, a point can be manually selected by the computing device, divided into a seed area and marked as a cluster, or a point can be randomly selected by the computing device, divided into a seed area and marked as a cluster.

Initial neighborhood pixel classification S142: Calculate the absolute value of the difference between the depth value of the pixel and the depth values of its four adjacent pixels above, below, left, and right, which will be less than or equal to the third preset threshold. The adjacent pixels corresponding to the absolute value of the difference are divided into seed regions and marked as clusters.

In the depth image, there is a certain difference between the depth value of the area where the target object is located and the depth values of other areas. This difference is particularly obvious when the target object is a building. Therefore, if the pixel is located in the area where the target object that needs to be singulated is located, then the adjacent pixel whose absolute value of the difference from the depth value of the pixel is less than or equal to the third preset threshold has a maximum probability that it is also located in the area where the target object needs to be singulated. The area where the target object that needs to be singled out is located; if the pixel is not in the area where the target object that needs to be singled out is located, then the absolute value of the difference from the depth value of the pixel is less than or equal to the third preset threshold for adjacent pixels The point maximum probability is also not within the area where the target that needs to be isolated is located. Therefore, the pixel and the adjacent pixels whose absolute value of the difference between its depth value is less than or equal to the third preset threshold are marked as the same cluster, and the area where the target object in the depth image is located can be classified into one or several clusters. , classify other areas into one or several clusters.

For the individualization of buildings, the third preset threshold can be set to 10 to ensure the accuracy of pixel classification and labeling.

Repeat neighborhood pixel classification S143: perform the above step S142 for other pixels in the seed area except the pixel, until the depth values of all pixels on the inner edge of the seed area are summed with those adjacent to them and located outside the seed area. The method stops when the absolute value of the difference between the depth values of the pixels is greater than the third preset threshold, and all pixels in the seed area are marked as the same cluster.

It should be noted that in actual situations, this step may not be performed, it may be performed only once, or it may be performed in a loop. Specifically, when the absolute value of the difference between the depth value of the pixel and the depth values of all adjacent pixels in step S142 is greater than the third preset threshold, then no adjacent pixel is divided into the seed area, And the pixel is individually marked as a cluster. In this case, this step is skipped and the subsequent steps are performed directly. When at least one adjacent pixel is divided into the seed area in step S142, the at least one adjacent pixel is regarded as the first adjacent pixel and step S142 is performed respectively to calculate the at least one first adjacent pixel. and the absolute value of the difference in depth value between the first adjacent pixel and the second adjacent pixel, when the absolute value of the difference in depth between the first adjacent pixel and the second adjacent pixel is greater than When the third preset threshold is reached, the pixel and the at least one first adjacent pixel are marked as the same cluster, and this step stops. In this case, this step is only performed once. In the above situation where it is only performed once, when at least one second adjacent pixel point is divided into the seed area, step S142 is continued for the at least one second adjacent pixel point to calculate the at least one second adjacent pixel point. The absolute value of the difference in depth value between a pixel and its second adjacent pixel, and this cycle continues until the depth values of all pixels on the inner edge of the seed area and the depth values of pixels adjacent to it and located outside the seed area The absolute value of the difference between the depth values is greater than the third preset threshold and stops. At this time, all pixels in the seed area are marked as the same cluster.

Traverse the remaining pixels S144: Repeat the above steps S141, S142 and S143 for the pixels in the depth image that are not marked as clusters in sequence until all the pixels in the depth image are marked as clusters and then stop to obtain the initial target cluster.

It should be noted that in this step, every time the above step S141 is repeated for a pixel point in the depth image that is not marked as a cluster, the pixel point is marked as a new cluster, so that all pixel points in the depth image are marked as a new cluster. When all the pixels are marked as clusters, the number of initial target clusters obtained is multiple. Among these different initial target clusters, there are target clusters that need to be singulated, and some that do not need to be singulated. Background cluster.

In an optional manner, the above step S144 is performed preferentially on the pixels at the outer edge of the seed area. By preferentially performing the above step S144 on the pixels at the outer edge of the seed area, the process of marking the pixels in the depth image as clusters is centered on the first marked pixel and gradually spreads outward, which is beneficial to Improve the efficiency of depth image clustering.

The depth image is segmented through a clustering algorithm based on region growing, and the areas corresponding to the target and the areas corresponding to the background are marked as clusters to obtain the initial target cluster and achieve accurate division of target areas and non-target areas.

Further, please continue to refer to Figure 3. The above step 140 also includes:

Cluster filtering S145: Obtain initial target clusters with an area greater than or equal to the fourth preset threshold, and obtain target clusters.

When the target object is a building, since the roof of the building is roughly flat and has a certain area, the area where the building is located will be divided into a larger cluster, while the background area outside the building has varying depth values. are uneven, so they will be divided into multiple small-area clusters. Based on this, by obtaining the initial target clusters with an area greater than or equal to the fourth preset threshold, the cluster corresponding to the area where the building is located can be found more accurately, and filtered out The cluster corresponding to the background area. The fourth preset threshold can generally be set to 300 to more reliably filter clusters corresponding to the background area.

According to another aspect of the embodiment of the present invention, a target singulation device is provided. For details, please refer to FIG. 4 , which shows the structure of a target singulation device according to an embodiment. As shown in the figure, the object singulation device 200 includes an acquisition unit 210, an image conversion unit 220, a first image segmentation unit 230, a second image segmentation unit 240, a calculation unit 250 and a determination unit 260. The acquisition unit 210 is used to acquire three-dimensional scene data. The image conversion unit 220 is used to convert the three-dimensional scene data into an optical image and a depth image from a bird's-eye view. The first image segmentation unit 230 is used to perform semantic segmentation processing on the optical image to obtain a target mask. The second image segmentation unit 240 is used to perform cluster segmentation processing on the depth image to obtain target object clusters. The calculation unit 250 is used to calculate the ratio of the intersection area and the union area between the target mask and the target cluster. The determination unit 260 is used to determine the target cluster or target mask corresponding to a ratio greater than or equal to the first preset threshold, and determine the area corresponding to the determined target cluster or target mask in the three-dimensional scene data as Single target.

In an optional manner, there are multiple target object masks and target object clusters, and there is a one-to-one correspondence between at least some of the target object masks and at least some of the target object clusters. The calculation unit 250 is used to respectively calculate the ratio of the intersection area and the union area between each target cluster and its corresponding target mask. The determination unit 260 is used to determine the target object cluster or target object mask corresponding to each ratio greater than or equal to the first preset threshold, and assign each determined target object cluster or target object mask to the corresponding target object mask in the three-dimensional scene data. The regions are respectively determined as a single target.

Please refer to FIG. 4 again. In an optional manner, the object singulation device 200 also includes a noise reduction unit 270. The noise reduction unit 270 is used to perform interpolation processing and/or filtering processing on the optical image and the depth image.

In an optional manner, the first image segmentation unit 230 is used to input the optical image into the convolutional neural network, to output the classification confidence of each pixel in the optical image through the convolutional neural network, and to obtain a value greater than or equal to The target object mask is obtained from the pixels corresponding to the classification confidence level equal to the second preset threshold.

In an optional manner, the second image segmentation unit 240 is used for cluster labeling: dividing one of the pixels in the depth image into a seed area and labeling it as a cluster for initial neighborhood pixel classification: calculating the pixel The absolute value of the difference between the depth value of a point and the depth values of its four adjacent pixels above, below, left and right will be less than or equal to the adjacent pixel corresponding to the absolute value of the difference between the third preset threshold. Divide it into the seed area and mark it as a cluster for repeated neighborhood pixel classification: perform the above-mentioned initial neighborhood pixel classification on other pixels in the seed area except this pixel, until all pixels on the inner edge of the seed area are The absolute value of the difference between the depth value of the point and the depth value of the adjacent pixel outside the seed area is greater than the third preset threshold and then stops. All pixels in the seed area are marked as the same cluster. Used to traverse the remaining pixels: Repeat the above-mentioned cluster marking, initial neighborhood pixel classification and repeated neighborhood pixel classification for pixels in the depth image that are not marked as clusters, until all pixels in the depth image are marked. Stop after clustering and obtain the initial target cluster.

In an optional manner, the second image segmentation unit 240 prioritizes the pixels at the outer edge of the seed area for repeated neighborhood pixel classification.

In an optional manner, the second image segmentation unit 240 is also used for cluster filtering: obtaining initial target object clusters with an area greater than or equal to the fourth preset threshold to obtain target object clusters.

According to another aspect of the embodiment of the present invention, a computing device is also provided. Please refer to FIG. 5 for details. The figure shows the structure of the computing device provided by an embodiment. The specific embodiment of the present invention does not describe the specific implementation of the computing device. Make limitations.

As shown in Figure 5, the computing device may include: a processor (processor) 402, a communications interface (Communications Interface) 404, a memory (memory) 406, and a communication bus 408.

Among them: the processor 402, the communication interface 404, and the memory 406 complete communication with each other through the communication bus 408. The communication interface 404 is used to communicate with network elements of other devices such as clients or other servers. The processor 402 is configured to execute the program 410. Specifically, the processor 402 can execute the above-mentioned related steps in the target object singulation method embodiment.

Specifically, program 410 may include program code including computer-executable instructions.

The processor 402 may be a central processing unit (CPU), an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention. The one or more processors included in the computing device may be the same type of processor, such as one or more CPUs; or they may be different types of processors, such as one or more CPUs and one or more ASICs.

Memory 406 is used to store programs 410. Memory 406 may include high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

Program 410 can specifically be called by processor 402 to cause the computing device to perform the following operations:

Obtain 3D scene data;

Convert three-dimensional scene data into optical images and depth images from a top-down perspective;

Perform semantic segmentation processing on the optical image to obtain the target mask;

Perform clustering and segmentation processing on depth images to obtain target object clusters;

Calculate the ratio of the intersection area and the union area between the target mask and the target cluster;

Determine the target cluster or target mask corresponding to the ratio greater than or equal to the first preset threshold, and determine the area corresponding to the determined target cluster or target mask in the three-dimensional scene data as a single target object .

According to another aspect of the embodiment of the present invention, a computer-readable storage medium is also provided. The storage medium stores at least one executable instruction. When the executable instruction is run on a computing device, the computing device causes the computing device to execute The target monomerization method in any of the above method embodiments.

Specifically, the executable instructions may be used to cause the computing device to perform the following operations:

Obtain 3D scene data;

The algorithms or displays provided herein are not inherently associated with any particular computer, virtual system, or other device. Various general-purpose systems can also be used with teaching based on this. From the above description, the structure required to construct such a system is obvious. Furthermore, embodiments of the present invention are not directed to any specific programming language. It should be understood that a variety of programming languages may be utilized to implement the invention described herein, and that the above descriptions of specific languages are intended to disclose the best mode of carrying out the invention.

In the instructions provided here, a number of specific details are described. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures, and techniques have not been shown in detail so as not to obscure the understanding of this description.

Similarly, it will be understood that in the above description of exemplary embodiments of the invention, various features of embodiments of the invention are sometimes grouped together into a single implementation in order to streamline the invention and assist in understanding one or more of the various inventive aspects. examples, diagrams, or descriptions thereof. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim.

Those skilled in the art will understand that modules in the devices in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. The modules or units or components in the embodiments may be combined into one module or unit or component, and they may be divided into multiple sub-modules or sub-units or sub-components. All features disclosed in this specification (including the accompanying claims, abstract and drawings) and any method so disclosed may be employed in any combination, except that at least some of such features and/or processes or units are mutually exclusive. All processes or units of the equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In the element claim enumerating several means, several of these means may be embodied by the same item of hardware. The use of the words first, second, third, etc. does not indicate any order. These words can be interpreted as names. Unless otherwise specified, the steps in the above embodiments should not be understood as limiting the order of execution.

Claims

A target monomerization method, characterized by including:

Obtain 3D scene data;

Convert the three-dimensional scene data into an optical image and a depth image from a top-down perspective;

Perform semantic segmentation processing on the optical image to obtain a target mask;

Perform clustering and segmentation processing on the depth image to obtain target object clusters;

Calculate the ratio of the intersection area and the union area between the target mask and the target cluster;

Determine the target object cluster or the target object mask corresponding to the ratio greater than or equal to the first preset threshold, and use the determined target object cluster or the target object mask in the three-dimensional scene data The corresponding area in is determined as the monomerized target.
The target singulation method according to claim 1, characterized in that the number of the target mask and the target cluster is multiple, and at least part of the target mask is connected to at least part of the target mask. There is a one-to-one correspondence between the target object clusters;

The calculation of the ratio of the intersection area and the union area between the target mask and the target cluster includes:

Calculate respectively the ratio of the intersection area and the union area between each target cluster and the corresponding target mask;

Determine the target object cluster or the target object mask corresponding to the ratio that is greater than or equal to the first preset threshold, and place the determined target object cluster or the target object mask in the three-dimensional The corresponding area in the scene data is determined as a single target, including:

Determine the target object cluster or the target object mask corresponding to each of the ratio values greater than or equal to the first preset threshold, and use each of the determined target object clusters or the target object mask to The corresponding area in the three-dimensional scene data is determined as one of the individualized targets.
The target object singulation method according to claim 1, characterized in that after converting the three-dimensional scene data into an optical image and a depth image from a top view, the method further includes:

Interpolation processing and/or filtering processing is performed on the optical image and the depth image.
The target object singulation method according to claim 1, characterized in that, performing semantic segmentation processing on the optical image to obtain the target object mask includes:

Input the optical image into a convolutional neural network;

Output the classification confidence of each pixel in the optical image through the convolutional neural network;

The pixels corresponding to the classification confidence greater than or equal to the second preset threshold are obtained to obtain the target mask.
The target object singulation method according to claim 1, characterized in that, performing clustering and segmentation processing on the depth image to obtain target object clusters includes:

Cluster labeling: divide one of the pixels in the depth image into a seed area and label it as a cluster;

Initial neighborhood pixel classification: Calculate the absolute value of the difference between the depth value of the pixel and the depth values of its four adjacent pixels above, below, left, and right. The absolute value will be less than or equal to the third preset threshold. The adjacent pixel points corresponding to the absolute value of the difference are divided into the seed area and marked as the cluster;

Repeat the neighborhood pixel classification: perform the neighborhood pixel clustering on other pixels in the seed area except the pixel, until the sum of the depth values of all pixels on the inner edge of the seed area is consistent with it. Stop after the absolute value of the difference between the depth values of pixels adjacent to and located outside the seed area is greater than the third preset threshold, and all pixels in the seed area are marked as the same one. statement cluster;

Traverse the remaining pixels: Repeat the cluster marking, the initial neighborhood pixel classification and the repeated neighborhood pixel classification for pixels in the depth image that are not marked as the cluster in sequence until the depth All pixels in the image are marked and then stopped to obtain the initial target cluster.
The object singulation method according to claim 5, characterized in that the repeated neighborhood pixel classification is performed on the pixels at the outer edge of the seed area with priority.
The target object singulation method according to claim 5, characterized in that after the repeated neighborhood pixel classification, the method further includes:

Cluster filtering: obtain the initial target clusters whose area is greater than or equal to the fourth preset threshold to obtain the target clusters.
A target monomerization device, characterized in that it includes:

Acquisition unit, used to obtain three-dimensional scene data;

An image conversion unit, configured to convert the three-dimensional scene data into an optical image and a depth image from a bird's-eye view;

The first image segmentation unit is used to perform semantic segmentation processing on the optical image to obtain an image mask of the target object;

The second image segmentation unit is used to perform clustering and segmentation processing on the depth image to obtain target object clusters;

A calculation unit configured to calculate the ratio of the intersection area and the union area between the target mask and the target cluster;

A determining unit, configured to determine the target cluster or target mask corresponding to the ratio greater than or equal to a first preset threshold, and place the determined target cluster or target mask in The corresponding area in the three-dimensional scene data is determined to be a single target object.
A computing device, characterized in that it includes: a processor, a memory, a communication interface and a communication bus, and the processor, the memory and the communication interface complete communication with each other through the communication bus;

The memory is used to store at least one executable instruction, and the executable instruction causes the processor to execute the target object singulation method according to any one of claims 1-7.
A computer storage medium, characterized in that at least one executable instruction is stored in the storage medium, and the executable instruction causes the processor to execute the object singulation as described in any one of claims 1-7. method.