WO2023226654A1 - Target object separation method and apparatus, device, and storage medium - Google Patents

Target object separation method and apparatus, device, and storage medium Download PDF

Info

Publication number
WO2023226654A1
WO2023226654A1 PCT/CN2023/089948 CN2023089948W WO2023226654A1 WO 2023226654 A1 WO2023226654 A1 WO 2023226654A1 CN 2023089948 W CN2023089948 W CN 2023089948W WO 2023226654 A1 WO2023226654 A1 WO 2023226654A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
target object
cluster
area
mask
Prior art date
Application number
PCT/CN2023/089948
Other languages
French (fr)
Chinese (zh)
Inventor
王孙平
Original Assignee
深圳市其域创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市其域创新科技有限公司 filed Critical 深圳市其域创新科技有限公司
Publication of WO2023226654A1 publication Critical patent/WO2023226654A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks

Definitions

  • the present invention relates to the field of information processing technology, and specifically to a target object singulation method, device, equipment and storage medium.
  • Real-life 3D is an important part of new infrastructure construction. Buildings or other targets are constructed as independent objects through processing on the 3D geographical scene, so that they can be selected individually. Since the amount of 3D scene information is huge and disordered, the processing of 3D scene information is complex and time-consuming.
  • the present invention provides a target object singulation method, device, equipment and storage medium, which can improve the accuracy and speed of target object singulation.
  • a target object singulation method which includes: acquiring three-dimensional scene data; converting the three-dimensional scene data into an optical image and a depth image from a top view; performing semantic segmentation processing on the optical image to obtain the target object image mask; perform clustering and segmentation processing on the depth image to obtain the target object cluster; calculate the ratio of the intersection area and the union area between the target object mask and the target object cluster; determine the ratio that is greater than or equal to the first preset threshold
  • the corresponding target cluster or target mask determines the area corresponding to the determined target cluster or target mask in the three-dimensional scene data as a single target.
  • the method provided by the present invention processes a smaller amount of data and has a faster processing speed.
  • This invention uses the depth information of point cloud data to fuse the masks after semantic segmentation of optical images through depth image clustering and segmentation, so that the segmentation area is complete and the edges of the final single target objects are accurate, effectively solving the problem of segmentation areas. Incomplete or beyond the scope of the real target object.
  • the present invention can make full use of the large distance between the top of the building and the ground and the prior information that the roof is close to the plane, accurately cluster and segment the depth image to obtain the target object cluster, and combine the
  • the target mask obtained by semantic segmentation of the optical image achieves precise and efficient filtering of target clusters by calculating the ratio of the intersection area and the union area between the target mask and the target cluster, and can be adapted to various sizes and shapes.
  • the geographical area of architectural style, and the data set required by the single method provided by the present invention is easy to obtain and label.
  • the number of target masks and target clusters is multiple, and there is a one-to-one correspondence between at least part of the target masks and at least part of the target clusters; calculating the target masks
  • the ratio of the intersection area and the union area between the target cluster and the target cluster includes: separately calculating the ratio of the intersection area and the union area between each target cluster and its corresponding target mask; determining whether it is greater than or equal to the first predetermined value.
  • the method further includes: performing interpolation processing and/or filtering processing on the optical image and the depth image.
  • interpolation processing and/or filtering processing By interpolating and/or filtering the hole points in the optical image or depth image, noise can be effectively reduced, which is beneficial to subsequent segmentation of the optical image or depth image.
  • semantic segmentation processing is performed on the optical image to obtain the target mask, including: inputting the optical image into a convolutional neural network; outputting the classification confidence of each pixel in the optical image through the convolutional neural network ; Obtain the pixels corresponding to the classification confidence greater than or equal to the second preset threshold, and obtain the target mask.
  • the convolutional neural network is used to determine the classification confidence of each pixel value in the optical image, and by obtaining the points corresponding to the confidence greater than or equal to the second preset threshold, the target mask is obtained, and the target mask is accurately realized in the optical image. Segmentation of the target area.
  • clustering and segmentation processing is performed on the depth image to obtain target object clusters, including: cluster labeling: dividing one pixel in the depth image into a seed area and marking it as a cluster; initial neighborhood pixels Classification: Calculate the absolute value of the difference between the depth value of a pixel and the depth values of its four adjacent pixels above, below, left, and right, corresponding to the absolute value of the difference that is less than or equal to the third preset threshold The adjacent pixels are divided into seed areas and marked as clusters; repeated neighborhood pixel classification: perform initial neighborhood pixel classification on other pixels in the seed area except pixels, until all pixels on the inner edge of the seed area are The absolute value of the difference between the depth value of the point and the depth value of the pixel adjacent to it and located outside the seed area stops after both are greater than the third preset threshold, and all pixels in the seed area are marked as the same cluster; Traverse the remaining pixels: Repeat cluster labeling, initial neighborhood pixel classification, and repeated neighborhood pixel classification for pixels in the depth image that are not marked as the cluster,
  • the depth image is segmented through a clustering algorithm based on region growing, and the areas corresponding to the target and the areas corresponding to the background are marked as clusters to obtain the initial target cluster and achieve accurate division of target areas and non-target areas.
  • pixels at the outer edge of the seed area are prioritized for repeated neighborhood pixel classification.
  • the process of marking the pixels in the depth image into clusters is centered on the first marked pixel and gradually spreads outward. , which is conducive to improving the efficiency of depth image clustering.
  • the method further includes: cluster filtering: obtaining initial target object clusters with an area greater than or equal to the fourth preset threshold to obtain the target object cluster.
  • the target object is a building
  • the roof of the building is roughly flat and has a certain area
  • the area where the building is located will be divided into a larger cluster, while the background area outside the building has varying depth values. are uneven, so they will be divided into multiple small-area clusters.
  • the cluster corresponding to the area where the building is located can be found more accurately, and filtered out The cluster corresponding to the background area.
  • a target object singulation device including: an acquisition unit for acquiring three-dimensional scene data; an image conversion unit for converting the three-dimensional scene data into an optical image and depth of a bird's-eye view. image; the first image segmentation unit is used to perform semantic segmentation processing on the optical image to obtain the target object mask; the second image segmentation unit is used to perform clustering segmentation processing on the depth image to obtain the target object cluster; the computing unit is used Calculate the ratio of the intersection area and the union area between the target mask and the target cluster; the determination unit is used to determine the target cluster or target mask corresponding to the ratio greater than or equal to the first preset threshold, and The area corresponding to the determined target cluster or target mask in the three-dimensional scene data is determined as a single target object.
  • a computing device including: a processor, a memory, a communication interface, and a communication bus.
  • the processor, the memory, and the communication interface complete communication with each other through the communication bus; and the memory is used to store at least one
  • the executable instructions cause the processor to execute the target object singulation method in any of the above ways.
  • a computer storage medium is also provided. At least one executable instruction is stored in the storage medium. The executable instruction causes the processor to execute the target object singulation method in any of the above ways.
  • Figure 1 is a flow chart of a target monomerization method provided by an embodiment of the present invention.
  • FIG. 2 is a flow chart of the sub-steps of step S130 in Figure 1;
  • FIG. 3 is a flow chart of the sub-steps of step S140 in Figure 2;
  • Figure 4 is a schematic structural diagram of a target monomerization device provided by an embodiment of the present invention.
  • Figure 5 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.
  • Real-life 3D is an important part of the country's new infrastructure construction.
  • buildings or other targets can be constructed into independent objects in three-dimensional form, making the constructed independent objects It can be selected individually, and by assigning specific attributes to it, the attribute information of the target object can be quickly selected and queried to achieve refined and dynamic management.
  • the target object is singled out by processing the three-dimensional point cloud data. Since the three-dimensional point cloud data retains the original geometric information in the three-dimensional space, the amount of information is huge and disorderly. The processing of the three-dimensional point cloud data will It is very complex, time-consuming, and difficult to label, resulting in low efficiency.
  • the present invention proposes a target object singulation method.
  • the three-dimensional point cloud or three-dimensional grid model is converted into an optical image and a depth image from a top view to reduce the amount of information that needs to be processed.
  • the optical image and depth The image is processed to obtain the target mask and the target cluster respectively, and then by calculating the ratio of the intersection area and the union area between the target mask and the target cluster, the corresponding area in the two-dimensional image may not be the target object.
  • Target clusters and target masks that is, target clusters corresponding to ratios smaller than the first preset threshold
  • the remaining target clusters are represented in the 3D point cloud or 3D grid model.
  • the corresponding area in is determined as a single target object to achieve rapid single processing of the target object.
  • a method for singulating a target object is provided. Please refer to FIG. 1 for details.
  • the figure shows the flow of a method for singulating a target object according to an embodiment of the present invention.
  • the method requires Computing devices that perform single-target execution, such as mobile phones, computers, servers, etc.
  • the method includes:
  • the 3D scene data includes 3D point clouds, 3D mesh models, etc.
  • the 3D scene data can be collected through 3D imaging sensors, such as binocular cameras, RGB ⁇ D cameras, etc., or through 3D imaging sensors and 3D laser scanning. It collects data in combination with a sensor or lidar, and transmits data to the computing device so that the computing device can obtain three-dimensional scene data.
  • three-dimensional scene data can be formed through drones, satellite photography or generated through oblique photogrammetry systems.
  • S120 Convert three-dimensional scene data into optical images and depth images from a top-down perspective.
  • the optical image can be a grayscale image or a color image.
  • the computing device can convert the three-dimensional scene data by obtaining the x-axis, y-axis coordinate values and pixel values of the highest point directly above each position in the three-dimensional scene data. into an optical image from a top-down perspective; the computing device can convert the three-dimensional scene data into a depth image from a top-down perspective by obtaining the x-axis, y-axis coordinate values and the depth value of the z-axis of the highest point directly above each position in the three-dimensional scene data.
  • S130 Perform semantic segmentation processing on the optical image to obtain the target object mask.
  • the target object mask can be an area in the optical image corresponding to the building in the three-dimensional scene data. If it is to singulate the objects in other three-dimensional scenes, the target mask can also be, for example, people, animals, plants, etc.
  • S140 Perform clustering and segmentation processing on the depth image to obtain target object clusters.
  • the target object cluster may be an area in the depth image corresponding to the building in the three-dimensional scene data.
  • Step S130 may be performed first, or step S140 may be performed first.
  • S150 Calculate the ratio of the intersection area and the union area between the target mask and the target cluster.
  • the intersection area between the target mask and the target cluster refers to the area of the intersection area between the target mask and the target cluster when the edges of the optical image and the depth image are aligned and overlapped.
  • the union area between the target mask and the target cluster refers to the size of the union area between the target mask and the target cluster when the edges of the optical image and the depth image are aligned and overlapped. It can be understood that the greater the ratio of the intersection area to the union area between the target mask and the target cluster, it means that the area corresponding to the target cluster or the target mask in the three-dimensional scene data is actually required. The more likely the area to be monomerized.
  • S160 Determine the target cluster or target mask corresponding to the ratio greater than or equal to the first preset threshold, and determine the area corresponding to the determined target cluster or target mask in the three-dimensional scene data as a single unit Target.
  • the target cluster or target mask is an area in an optical image or a depth image from a bird's-eye view. Determining the area corresponding to the target cluster or target mask in the three-dimensional scene data as a single target object means Correspond to the area where the target object is clustered in the depth image or the area where the target object is masked in the optical image to the three-dimensional scene data, and the entire orthographic projection direction corresponding to the area in the three-dimensional scene data (i.e., the z-axis direction) The data was determined to be the monomerized target.
  • the first preset threshold can be set according to the actual situation of the target object that needs to be singulated. For example, when the target object that actually needs to be singulated is a building, the first preset threshold can be set on the computing device.
  • a preset threshold is set to 0.6, thereby eliminating target clusters and target masks in areas corresponding to buildings whose intersection area and union area are less than 0.6 and may not be actually required to be singled, and retain them.
  • the obtained target mask and target cluster may be poor.
  • the value of the first preset threshold can be set larger to filter out the area that may be related to the target object that actually needs to be singulated. Uncorresponding target masks and target clusters make the final single target more accurate.
  • the area corresponding to the target object cluster in the three-dimensional scene data is determined as a single target object.
  • the target object singulation method by converting the acquired three-dimensional scene data into a two-dimensional top-view optical image and a depth image, the amount of information that needs to be processed can be effectively reduced.
  • the process obtains the target mask and the target cluster respectively.
  • the ratio of the intersection area and the union area between the target mask and the target cluster the corresponding area in the two-dimensional image may not be the target cluster.
  • target masks that is, target clusters and target masks corresponding to ratios smaller than the first preset threshold
  • target masks that is, target clusters and target masks corresponding to ratios smaller than the first preset threshold
  • the method provided by the present invention processes a smaller amount of data and has a faster processing speed.
  • This invention uses the depth information of point cloud data to fuse the masks after semantic segmentation of optical images through depth image clustering and segmentation, so that the segmentation area is complete and the edges of the final single target objects are accurate, effectively solving the problem of segmentation areas. Incomplete or beyond the scope of the real target object.
  • the present invention can make full use of the large distance between the top of the building and the ground and the prior information that the roof is close to the plane, accurately cluster and segment the depth image to obtain the target object cluster, and combine the
  • the target mask obtained by semantic segmentation of the optical image can accurately and efficiently filter the target mask and target cluster by calculating the ratio of the intersection area and the union area between the target mask and the target cluster, which can be adapted to the needs of Geographic areas of multiple sizes and different architectural styles, and the data sets required by the single method provided by the present invention are easy to obtain and label.
  • the present invention further proposes an implementation.
  • the number of target object masks and target object clusters is multiple, And there is a one-to-one correspondence between at least part of the target mask and at least part of the target cluster.
  • the correspondence relationship refers to the positional correspondence between the target mask and the target cluster, such as the correspondence between the (x, y) coordinate positions in the image.
  • the above step S150 includes:
  • the ratio of the intersection area and the union area between each target cluster and its corresponding target mask is calculated separately.
  • this step can also be replaced by calculating the ratio of the intersection area and the union area between each target mask and its corresponding target cluster. In the same way, if a certain target mask does not correspond to it For target clusters, you can choose not to calculate the target mask. Of course, you can also continue to calculate the target mask, but the resulting ratio is 0.
  • the above step S160 includes:
  • the present invention further proposes an implementation. Specifically, after the above step S120, the method further includes:
  • the interpolation process can use bilinear interpolation method.
  • the specific process is as follows:
  • the filtering process can use median filtering operation.
  • the specific process is as follows:
  • Sort calculate the median of the pixel values or depth values of all pixels in the window, and replace the pixel value or depth value of the pixel with the median value.
  • the pixel value is an RGB value.
  • the pixel value is a grayscale value. The same applies to the pixel values mentioned below.
  • step S130 the present invention further proposes a specific implementation.
  • Figure 2 shows the sub-step process of step S130.
  • step S130 includes:
  • the optical image is input into the trained convolutional neural network.
  • S132 Output the classification confidence of each pixel in the optical image through the convolutional neural network.
  • the convolutional neural network calculates the softmax function for the output of the last layer and outputs the classification confidence of the classification result of each pixel in the optical image.
  • S133 Obtain the pixels corresponding to the classification confidence greater than or equal to the second preset threshold, and obtain the target object mask.
  • the classification confidence of each pixel is compared with the second preset threshold, and points greater than or equal to the second preset threshold are retained to obtain the target mask of the corresponding area.
  • the second preset threshold can be set to 0.3 to ensure the accuracy of the obtained target object mask.
  • the convolutional neural network is used to determine the classification confidence of each pixel value in the optical image, and by obtaining the points corresponding to the confidence greater than or equal to the second preset threshold, the target mask is obtained, and the target mask is accurately realized in the optical image. Segmentation of the target area.
  • step S140 the present invention further proposes a specific implementation.
  • the figure shows the sub-step process of step S140.
  • step S140 can adopt a clustering algorithm based on region growing. Specifically include:
  • Cluster marking S141 Divide one pixel in the depth image into a seed area and mark it as a cluster.
  • a point can be manually selected by the computing device, divided into a seed area and marked as a cluster, or a point can be randomly selected by the computing device, divided into a seed area and marked as a cluster.
  • Initial neighborhood pixel classification S142 Calculate the absolute value of the difference between the depth value of the pixel and the depth values of its four adjacent pixels above, below, left, and right, which will be less than or equal to the third preset threshold.
  • the adjacent pixels corresponding to the absolute value of the difference are divided into seed regions and marked as clusters.
  • the depth image there is a certain difference between the depth value of the area where the target object is located and the depth values of other areas. This difference is particularly obvious when the target object is a building. Therefore, if the pixel is located in the area where the target object that needs to be singulated is located, then the adjacent pixel whose absolute value of the difference from the depth value of the pixel is less than or equal to the third preset threshold has a maximum probability that it is also located in the area where the target object needs to be singulated.
  • the area where the target object that needs to be singled out is located; if the pixel is not in the area where the target object that needs to be singled out is located, then the absolute value of the difference from the depth value of the pixel is less than or equal to the third preset threshold for adjacent pixels
  • the point maximum probability is also not within the area where the target that needs to be isolated is located. Therefore, the pixel and the adjacent pixels whose absolute value of the difference between its depth value is less than or equal to the third preset threshold are marked as the same cluster, and the area where the target object in the depth image is located can be classified into one or several clusters. , classify other areas into one or several clusters.
  • the third preset threshold can be set to 10 to ensure the accuracy of pixel classification and labeling.
  • Repeat neighborhood pixel classification S143 perform the above step S142 for other pixels in the seed area except the pixel, until the depth values of all pixels on the inner edge of the seed area are summed with those adjacent to them and located outside the seed area. The method stops when the absolute value of the difference between the depth values of the pixels is greater than the third preset threshold, and all pixels in the seed area are marked as the same cluster.
  • this step may not be performed, it may be performed only once, or it may be performed in a loop. Specifically, when the absolute value of the difference between the depth value of the pixel and the depth values of all adjacent pixels in step S142 is greater than the third preset threshold, then no adjacent pixel is divided into the seed area, And the pixel is individually marked as a cluster. In this case, this step is skipped and the subsequent steps are performed directly.
  • the at least one adjacent pixel is divided into the seed area in step S142, the at least one adjacent pixel is regarded as the first adjacent pixel and step S142 is performed respectively to calculate the at least one first adjacent pixel.
  • step S142 is continued for the at least one second adjacent pixel point to calculate the at least one second adjacent pixel point.
  • step S141 every time the above step S141 is repeated for a pixel point in the depth image that is not marked as a cluster, the pixel point is marked as a new cluster, so that all pixel points in the depth image are marked as a new cluster.
  • the number of initial target clusters obtained is multiple. Among these different initial target clusters, there are target clusters that need to be singulated, and some that do not need to be singulated. Background cluster.
  • the above step S144 is performed preferentially on the pixels at the outer edge of the seed area.
  • the process of marking the pixels in the depth image as clusters is centered on the first marked pixel and gradually spreads outward, which is beneficial to Improve the efficiency of depth image clustering.
  • the depth image is segmented through a clustering algorithm based on region growing, and the areas corresponding to the target and the areas corresponding to the background are marked as clusters to obtain the initial target cluster and achieve accurate division of target areas and non-target areas.
  • step 140 also includes:
  • Cluster filtering S145 Obtain initial target clusters with an area greater than or equal to the fourth preset threshold, and obtain target clusters.
  • the target object is a building
  • the area where the building is located will be divided into a larger cluster, while the background area outside the building has varying depth values. are uneven, so they will be divided into multiple small-area clusters.
  • the fourth preset threshold can generally be set to 300 to more reliably filter clusters corresponding to the background area.
  • a target singulation device is provided.
  • the object singulation device 200 includes an acquisition unit 210, an image conversion unit 220, a first image segmentation unit 230, a second image segmentation unit 240, a calculation unit 250 and a determination unit 260.
  • the acquisition unit 210 is used to acquire three-dimensional scene data.
  • the image conversion unit 220 is used to convert the three-dimensional scene data into an optical image and a depth image from a bird's-eye view.
  • the first image segmentation unit 230 is used to perform semantic segmentation processing on the optical image to obtain a target mask.
  • the second image segmentation unit 240 is used to perform cluster segmentation processing on the depth image to obtain target object clusters.
  • the calculation unit 250 is used to calculate the ratio of the intersection area and the union area between the target mask and the target cluster.
  • the determination unit 260 is used to determine the target cluster or target mask corresponding to a ratio greater than or equal to the first preset threshold, and determine the area corresponding to the determined target cluster or target mask in the three-dimensional scene data as Single target.
  • the calculation unit 250 is used to respectively calculate the ratio of the intersection area and the union area between each target cluster and its corresponding target mask.
  • the determination unit 260 is used to determine the target object cluster or target object mask corresponding to each ratio greater than or equal to the first preset threshold, and assign each determined target object cluster or target object mask to the corresponding target object mask in the three-dimensional scene data. The regions are respectively determined as a single target.
  • the object singulation device 200 also includes a noise reduction unit 270.
  • the noise reduction unit 270 is used to perform interpolation processing and/or filtering processing on the optical image and the depth image.
  • the first image segmentation unit 230 is used to input the optical image into the convolutional neural network, to output the classification confidence of each pixel in the optical image through the convolutional neural network, and to obtain a value greater than or equal to The target object mask is obtained from the pixels corresponding to the classification confidence level equal to the second preset threshold.
  • the second image segmentation unit 240 is used for cluster labeling: dividing one of the pixels in the depth image into a seed area and labeling it as a cluster for initial neighborhood pixel classification: calculating the pixel The absolute value of the difference between the depth value of a point and the depth values of its four adjacent pixels above, below, left and right will be less than or equal to the adjacent pixel corresponding to the absolute value of the difference between the third preset threshold.
  • the second image segmentation unit 240 prioritizes the pixels at the outer edge of the seed area for repeated neighborhood pixel classification.
  • the second image segmentation unit 240 is also used for cluster filtering: obtaining initial target object clusters with an area greater than or equal to the fourth preset threshold to obtain target object clusters.
  • a computing device is also provided. Please refer to FIG. 5 for details.
  • the figure shows the structure of the computing device provided by an embodiment.
  • the specific embodiment of the present invention does not describe the specific implementation of the computing device. Make limitations.
  • the computing device may include: a processor (processor) 402, a communications interface (Communications Interface) 404, a memory (memory) 406, and a communication bus 408.
  • processor processor
  • communications interface Communication Interface
  • memory memory
  • the processor 402 the communication interface 404, and the memory 406 complete communication with each other through the communication bus 408.
  • the communication interface 404 is used to communicate with network elements of other devices such as clients or other servers.
  • the processor 402 is configured to execute the program 410. Specifically, the processor 402 can execute the above-mentioned related steps in the target object singulation method embodiment.
  • program 410 may include program code including computer-executable instructions.
  • the processor 402 may be a central processing unit (CPU), an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention.
  • the one or more processors included in the computing device may be the same type of processor, such as one or more CPUs; or they may be different types of processors, such as one or more CPUs and one or more ASICs.
  • Memory 406 is used to store programs 410.
  • Memory 406 may include high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
  • Program 410 can specifically be called by processor 402 to cause the computing device to perform the following operations:
  • a computer-readable storage medium stores at least one executable instruction.
  • the computing device causes the computing device to execute The target monomerization method in any of the above method embodiments.
  • the executable instructions may be used to cause the computing device to perform the following operations:
  • modules in the devices in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment.
  • the modules or units or components in the embodiments may be combined into one module or unit or component, and they may be divided into multiple sub-modules or sub-units or sub-components.
  • All features disclosed in this specification including the accompanying claims, abstract and drawings) and any method so disclosed may be employed in any combination, except that at least some of such features and/or processes or units are mutually exclusive. All processes or units of the equipment are combined.
  • Each feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to the technical field of information processing, in particular to a target object separation method and apparatus, a device, and a storage medium. The target object separation method comprises: acquiring three-dimensional scene data; converting the three-dimensional scene data into an optical image and a depth image of a top view; performing semantic segmentation processing on the optical image to obtain target object masks; performing clustering segmentation processing on the depth image to obtain target object clusters; calculating ratios of intersection areas to union areas between the target object masks and the target object clusters; and determining a target object cluster or a target object mask corresponding to a ratio greater than or equal to a first preset threshold value, and determining a region corresponding to the determined target object cluster or target object mask in the three-dimensional scene data as a separation target object. In this way, the present invention can improve the accuracy and speed of target object separation.

Description

一种目标物单体化方法、装置、设备及存储介质A target object singulation method, device, equipment and storage medium 技术领域Technical field
本发明涉及信息处理技术领域,具体涉及一种目标物单体化方法、装置、设备及存储介质。The present invention relates to the field of information processing technology, and specifically to a target object singulation method, device, equipment and storage medium.
背景技术Background technique
实景三维是新型基础设施建设的重要组成部分,在三维地理场景上通过处理将建筑物或其他目标物构建为独立对象,使得能够对其单独进行选取。由于三维场景信息量庞大、无序,因此对三维场景信息的处理复杂且耗时。Real-life 3D is an important part of new infrastructure construction. Buildings or other targets are constructed as independent objects through processing on the 3D geographical scene, so that they can be selected individually. Since the amount of 3D scene information is huge and disordered, the processing of 3D scene information is complex and time-consuming.
技术解决方案Technical solutions
鉴于上述问题,本发明提供一种目标物单体化方法、装置、设备及存储介质,能够提升目标物单体化的精度和速度。In view of the above problems, the present invention provides a target object singulation method, device, equipment and storage medium, which can improve the accuracy and speed of target object singulation.
根据本发明的一个方面,提供一种目标物单体化方法,包括:获取三维场景数据;将三维场景数据转换为俯视视角的光学图像和深度图像;对光学图像进行语义分割处理,得到目标物的图像掩膜;对深度图像进行聚类分割处理,得到目标物簇;计算目标物掩膜和目标物簇之间交集面积与并集面积的比值;确定大于或等于第一预设阈值的比值所对应的目标物簇或目标物掩膜,将确定的目标物簇或目标物掩膜在三维场景数据中所对应的区域确定为单体化目标物。According to one aspect of the present invention, a target object singulation method is provided, which includes: acquiring three-dimensional scene data; converting the three-dimensional scene data into an optical image and a depth image from a top view; performing semantic segmentation processing on the optical image to obtain the target object image mask; perform clustering and segmentation processing on the depth image to obtain the target object cluster; calculate the ratio of the intersection area and the union area between the target object mask and the target object cluster; determine the ratio that is greater than or equal to the first preset threshold The corresponding target cluster or target mask determines the area corresponding to the determined target cluster or target mask in the three-dimensional scene data as a single target.
相较于直接对三维点云数据进行单体化的方法而言,本发明提供的方法处理的数据量小,处理速度更快。本发明利用点云数据的深度信息,通过深度图像聚类分割的簇融合光学图像语义分割后的掩膜,使得分割区域完整,最终确定的单体化目标物的边缘精确,有效解决了分割区域不完整或超出真实目标物范围的问题。对于建筑物的单体化而言,本发明能够充分利用建筑物顶部与地面存在的较大距离以及屋顶接近平面的先验信息,准确地对深度图像进行聚类分割得到目标物簇,结合对光学图像语义分割得到的目标物掩膜,通过计算目标物掩膜和目标物簇之间交集面积与并集面积的比值,实现对目标物簇精确高效地过滤,可适应具有多种尺寸和不同建筑风格的地理区域,并且本发明提供的单体化方法所需的数据集容易获取和标注。Compared with the method of directly singulating three-dimensional point cloud data, the method provided by the present invention processes a smaller amount of data and has a faster processing speed. This invention uses the depth information of point cloud data to fuse the masks after semantic segmentation of optical images through depth image clustering and segmentation, so that the segmentation area is complete and the edges of the final single target objects are accurate, effectively solving the problem of segmentation areas. Incomplete or beyond the scope of the real target object. For the single unitization of buildings, the present invention can make full use of the large distance between the top of the building and the ground and the prior information that the roof is close to the plane, accurately cluster and segment the depth image to obtain the target object cluster, and combine the The target mask obtained by semantic segmentation of the optical image achieves precise and efficient filtering of target clusters by calculating the ratio of the intersection area and the union area between the target mask and the target cluster, and can be adapted to various sizes and shapes. The geographical area of architectural style, and the data set required by the single method provided by the present invention is easy to obtain and label.
在一种可选的方式中,目标物掩膜和目标物簇的数量均为多个,并且至少部分目标物掩膜与至少部分目标物簇之间具有一一对应关系;计算目标物掩膜和目标物簇之间交集面积与并集面积的比值,包括:分别计算每个目标物簇和与其对应的目标物掩膜之间交集面积与并集面积的比值;确定大于或等于第一预设阈值的比值所对应的目标物簇或目标物掩膜,将确定的目标物簇或目标物掩膜在三维场景数据中所对应的区域确定为单体化目标物,包括:确定每个大于或等于第一预设阈值的比值所对应的目标物簇或目标物掩膜,将确定的每个目标物簇或目标物掩膜在三维场景数据中所对应的区域分别确定为一个单体化目标物。通过分别计算每个目标物簇和与其对应的目标物掩膜之间交集面积与并集面积的比值并确定每个大于或等于第一预设阈值的比值所对应的目标物簇或目标物掩膜,使得当三维点云或三维网格模型中存在多个实际需要单体化的目标物时,计算设备可以将每个目标物分别进行单体化。In an optional method, the number of target masks and target clusters is multiple, and there is a one-to-one correspondence between at least part of the target masks and at least part of the target clusters; calculating the target masks The ratio of the intersection area and the union area between the target cluster and the target cluster includes: separately calculating the ratio of the intersection area and the union area between each target cluster and its corresponding target mask; determining whether it is greater than or equal to the first predetermined value. Set the target cluster or target mask corresponding to the ratio of the threshold, and determine the area corresponding to the determined target cluster or target mask in the three-dimensional scene data as a single target, including: determining each target cluster or target mask that is larger than Or the target cluster or target mask corresponding to the ratio equal to the first preset threshold, and the area corresponding to each determined target cluster or target mask in the three-dimensional scene data is determined as a single unit Target. By separately calculating the ratio of the intersection area and the union area between each target cluster and its corresponding target mask, and determining the target cluster or target mask corresponding to each ratio greater than or equal to the first preset threshold. Membrane, so that when there are multiple targets that actually need to be singulated in the three-dimensional point cloud or three-dimensional mesh model, the computing device can singulate each target separately.
在一种可选的方式中,将三维场景数据转换为俯视视角的光学图像和深度图像之后,方法还包括:对光学图像和深度图像进行插值处理和/或滤波处理。通过对光学图像或深度图像中的空洞点进行插值处理和/或滤波处理,可以有效减少噪声,有利于后续对光学图像或深度图像进行分割处理。In an optional manner, after converting the three-dimensional scene data into an optical image and a depth image from a top view, the method further includes: performing interpolation processing and/or filtering processing on the optical image and the depth image. By interpolating and/or filtering the hole points in the optical image or depth image, noise can be effectively reduced, which is beneficial to subsequent segmentation of the optical image or depth image.
在一种可选的方式中,对光学图像进行语义分割处理,得到目标物掩膜,包括:将光学图像输入卷积神经网络;通过卷积神经网络输出光学图像中各像素点的分类置信度;获取大于或等于第二预设阈值的分类置信度所对应的像素点,得到目标物掩膜。通过卷积神经网络确定光学图像中各点像素值的分类置信度,并通过获取大于或等于第二预设阈值的置信度所对应的点,得到目标物掩膜,准确地实现对光学图像中目标物所在区域的分割。In an optional method, semantic segmentation processing is performed on the optical image to obtain the target mask, including: inputting the optical image into a convolutional neural network; outputting the classification confidence of each pixel in the optical image through the convolutional neural network ; Obtain the pixels corresponding to the classification confidence greater than or equal to the second preset threshold, and obtain the target mask. The convolutional neural network is used to determine the classification confidence of each pixel value in the optical image, and by obtaining the points corresponding to the confidence greater than or equal to the second preset threshold, the target mask is obtained, and the target mask is accurately realized in the optical image. Segmentation of the target area.
在一种可选的方式中,对深度图像进行聚类分割处理,得到目标物簇,包括:簇标记:将深度图像中的其中一个像素点划分至种子区域并标记为簇;初次邻域像素归类:计算像素点的深度值与其上、下、左、右四个相邻像素点的深度值之间的差的绝对值,将小于或等于第三预设阈值的差的绝对值所对应的相邻像素点划分至种子区域并标记为簇;重复邻域像素归类:对种子区域内除像素点之外的其他像素点分别进行初次邻域像素归类,直至种子区域内部边缘所有像素点的深度值和与其相邻且位于种子区域外部的像素点的深度值之间的差的绝对值均大于第三预设阈值后停止,种子区域内的所有像素点被标记为同一个簇;遍历剩余像素:重复对所述深度图像中没有被标记为所述簇的像素点依次进行簇标记、初次邻域像素归类和重复邻域像素归类,直至深度图像中的所有像素点均被标记后停止,得到初始目标物簇。通过基于区域生长的聚类算法对深度图像进行分割,将目标物对应的区域和背景对应的区域均标记为簇,得到初始目标物簇,实现对目标物区域以及非目标物区域的精确划分。In an optional way, clustering and segmentation processing is performed on the depth image to obtain target object clusters, including: cluster labeling: dividing one pixel in the depth image into a seed area and marking it as a cluster; initial neighborhood pixels Classification: Calculate the absolute value of the difference between the depth value of a pixel and the depth values of its four adjacent pixels above, below, left, and right, corresponding to the absolute value of the difference that is less than or equal to the third preset threshold The adjacent pixels are divided into seed areas and marked as clusters; repeated neighborhood pixel classification: perform initial neighborhood pixel classification on other pixels in the seed area except pixels, until all pixels on the inner edge of the seed area are The absolute value of the difference between the depth value of the point and the depth value of the pixel adjacent to it and located outside the seed area stops after both are greater than the third preset threshold, and all pixels in the seed area are marked as the same cluster; Traverse the remaining pixels: Repeat cluster labeling, initial neighborhood pixel classification, and repeated neighborhood pixel classification for pixels in the depth image that are not marked as the cluster, until all pixels in the depth image are classified. Stop after marking and obtain the initial target cluster. The depth image is segmented through a clustering algorithm based on region growing, and the areas corresponding to the target and the areas corresponding to the background are marked as clusters to obtain the initial target cluster and achieve accurate division of target areas and non-target areas.
在一种可选的方式中,优先对种子区域外边缘的像素点进行重复邻域像素归类。通过优先对种子区域外边缘的像素点进行重复邻域像素归类,使得在将深度图像中的像素点标记为簇的过程中,是以第一个标记的像素点为中心并逐步向外扩散,有利于提升对深度图像聚类的效率。In an optional method, pixels at the outer edge of the seed area are prioritized for repeated neighborhood pixel classification. By giving priority to the pixels at the outer edge of the seed area and repeatedly classifying neighborhood pixels, the process of marking the pixels in the depth image into clusters is centered on the first marked pixel and gradually spreads outward. , which is conducive to improving the efficiency of depth image clustering.
在一种可选的方式中,重复邻域像素归类之后,方法还包括:簇过滤:获取面积大于或等于第四预设阈值的初始目标物簇,得到目标物簇。对于目标物为建筑物时,由于建筑物屋顶大致为平面并且屋顶具有一定面积,因此一个建筑物所在区域会被分割在一个较大面积的簇中,而建筑物以外的背景区域由于深度值参差不齐,因此会被分割为多个小面积的簇,基于此,通过获取面积大于或等于第四预设阈值的初始目标物簇便可以较准确地找到建筑物所在区域对应的簇,过滤掉背景区域所对应的簇。In an optional manner, after repeating the neighborhood pixel classification, the method further includes: cluster filtering: obtaining initial target object clusters with an area greater than or equal to the fourth preset threshold to obtain the target object cluster. When the target object is a building, since the roof of the building is roughly flat and has a certain area, the area where the building is located will be divided into a larger cluster, while the background area outside the building has varying depth values. are uneven, so they will be divided into multiple small-area clusters. Based on this, by obtaining the initial target clusters with an area greater than or equal to the fourth preset threshold, the cluster corresponding to the area where the building is located can be found more accurately, and filtered out The cluster corresponding to the background area.
根据本发明的另一个方面,还提供一种目标物单体化装置,包括:获取单元,用于获取三维场景数据;图像转换单元,用于将三维场景数据转换为俯视视角的光学图像和深度图像;第一图像分割单元,用于对光学图像进行语义分割处理,得到目标物掩膜;第二图像分割单元,用于对深度图像进行聚类分割处理,得到目标物簇;计算单元,用于计算目标物掩膜和目标物簇之间交集面积与并集面积的比值;确定单元,用于确定大于或等于第一预设阈值的比值所对应的目标物簇或目标物掩膜,将确定的目标物簇或目标物掩膜在三维场景数据中所对应的区域确定为单体化目标物。According to another aspect of the present invention, a target object singulation device is also provided, including: an acquisition unit for acquiring three-dimensional scene data; an image conversion unit for converting the three-dimensional scene data into an optical image and depth of a bird's-eye view. image; the first image segmentation unit is used to perform semantic segmentation processing on the optical image to obtain the target object mask; the second image segmentation unit is used to perform clustering segmentation processing on the depth image to obtain the target object cluster; the computing unit is used Calculate the ratio of the intersection area and the union area between the target mask and the target cluster; the determination unit is used to determine the target cluster or target mask corresponding to the ratio greater than or equal to the first preset threshold, and The area corresponding to the determined target cluster or target mask in the three-dimensional scene data is determined as a single target object.
根据本发明的另一个方面,还提供一种计算设备,包括:处理器、存储器、通信接口和通信总线,处理器、存储器和通信接口通过通信总线完成相互间的通信;存储器用于存放至少一可执行指令,可执行指令使处理器执行如上任一方式中的目标物单体化方法。According to another aspect of the present invention, a computing device is also provided, including: a processor, a memory, a communication interface, and a communication bus. The processor, the memory, and the communication interface complete communication with each other through the communication bus; and the memory is used to store at least one The executable instructions cause the processor to execute the target object singulation method in any of the above ways.
根据本发明的另一个方面,还提供一种计算机存储介质,存储介质中存储有至少一可执行指令,可执行指令使处理器执行如上任一方式中的目标物单体化方法。According to another aspect of the present invention, a computer storage medium is also provided. At least one executable instruction is stored in the storage medium. The executable instruction causes the processor to execute the target object singulation method in any of the above ways.
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。The above description is only an overview of the technical solution of the present invention. In order to have a clearer understanding of the technical means of the present invention, it can be implemented according to the content of the description, and in order to make the above and other objects, features and advantages of the present invention more obvious and understandable. , the specific embodiments of the present invention are listed below.
附图说明Description of the drawings
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are for the purpose of illustrating preferred embodiments only and are not to be construed as limiting the invention. Also throughout the drawings, the same reference characters are used to designate the same components. In the attached picture:
图1为本发明实施例提供的目标物单体化方法的流程图;Figure 1 is a flow chart of a target monomerization method provided by an embodiment of the present invention;
图2为图1中步骤S130子步骤的流程图;Figure 2 is a flow chart of the sub-steps of step S130 in Figure 1;
图3为图2中步骤S140子步骤的流程图;Figure 3 is a flow chart of the sub-steps of step S140 in Figure 2;
图4为本发明实施例提供的目标物单体化装置的结构示意图;Figure 4 is a schematic structural diagram of a target monomerization device provided by an embodiment of the present invention;
图5为本发明实施例提供的计算设备的结构示意图。Figure 5 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.
本发明的实施方式Embodiments of the invention
下面将参照附图更详细地描述本发明的示例性实施例。虽然附图中显示了本发明的示例性实施例,然而应当理解,可以以各种形式实现本发明而不应被这里阐述的实施例所限制。Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein.
实景三维是国家新型基础设施建设的重要组成部分,在三维地理场景中通过切割、重建、矢量叠加等处理,可以将建筑物或其他目标物构建为三维形式的独立对象,使得构建好的独立对象可以被单独选中,进而通过赋予其特定属性,可以快速选取并查询目标物的属性信息,实现精细化和动态化管理。Real-life 3D is an important part of the country's new infrastructure construction. In the 3D geographical scene, through cutting, reconstruction, vector superposition and other processes, buildings or other targets can be constructed into independent objects in three-dimensional form, making the constructed independent objects It can be selected individually, and by assigning specific attributes to it, the attribute information of the target object can be quickly selected and queried to achieve refined and dynamic management.
一般通过对三维点云数据进行处理实现对目标物的单体化,由于三维点云数据保留了三维空间中原始的几何信息,因此信息量庞大且无无序,对三维点云数据的处理会非常复杂、耗时,并且标注也较为困难,从而导致效率低下。Generally, the target object is singled out by processing the three-dimensional point cloud data. Since the three-dimensional point cloud data retains the original geometric information in the three-dimensional space, the amount of information is huge and disorderly. The processing of the three-dimensional point cloud data will It is very complex, time-consuming, and difficult to label, resulting in low efficiency.
基于此,本发明提出一种目标物单体化方法,首先将三维点云或三维网格模型先转换为俯视视角的光学图像和深度图像,减少需要处理的信息量,其次对光学图像和深度图像进行处理,分别得到目标物掩膜和目标物簇,然后通过计算目标物掩膜和目标物簇之间交集面积与并集面积的比值,将在二维图像中对应的区域可能不是目标物的目标物簇和目标物掩膜(即小于第一预设阈值的比值所对应的目标物簇)排除,保证数据的准确性,最后将剩余的目标物簇在三维点云或三维网格模型中所对应的区域确定为单体化目标物,实现对目标物的快速单体化处理。Based on this, the present invention proposes a target object singulation method. First, the three-dimensional point cloud or three-dimensional grid model is converted into an optical image and a depth image from a top view to reduce the amount of information that needs to be processed. Secondly, the optical image and depth The image is processed to obtain the target mask and the target cluster respectively, and then by calculating the ratio of the intersection area and the union area between the target mask and the target cluster, the corresponding area in the two-dimensional image may not be the target object. Target clusters and target masks (that is, target clusters corresponding to ratios smaller than the first preset threshold) are excluded to ensure the accuracy of the data. Finally, the remaining target clusters are represented in the 3D point cloud or 3D grid model. The corresponding area in is determined as a single target object to achieve rapid single processing of the target object.
根据本发明实施例的一个方面,提供一种目标物单体化方法,具体请参阅图1,图中示出了本发明一实施例提供的目标物单体化方法的流程,该方法由需要进行目标物单体化的计算设备执行,例如手机、计算机、服务器等。如图1中所示,该方法包括:According to one aspect of the embodiment of the present invention, a method for singulating a target object is provided. Please refer to FIG. 1 for details. The figure shows the flow of a method for singulating a target object according to an embodiment of the present invention. The method requires Computing devices that perform single-target execution, such as mobile phones, computers, servers, etc. As shown in Figure 1, the method includes:
S110:获取三维场景数据。S110: Obtain three-dimensional scene data.
在本步骤中,三维场景数据包括三维点云、三维网格模型等,三维场景数据可以通过三维成像传感器采集,例如双目相机、RGB‑D相机等,也可以通过三维成像传感器与三维激光扫描仪或激光雷达结合进行采集,并通过与计算设备进行数据传输,使计算设备获取到三维场景数据。In this step, the 3D scene data includes 3D point clouds, 3D mesh models, etc. The 3D scene data can be collected through 3D imaging sensors, such as binocular cameras, RGB‑D cameras, etc., or through 3D imaging sensors and 3D laser scanning. It collects data in combination with a sensor or lidar, and transmits data to the computing device so that the computing device can obtain three-dimensional scene data.
对于城市场景数据而言,三维场景数据可以通过无人机、卫星拍摄形成或者通过倾斜摄影测量系统生成。For urban scene data, three-dimensional scene data can be formed through drones, satellite photography or generated through oblique photogrammetry systems.
S120:将三维场景数据转换为俯视视角的光学图像和深度图像。S120: Convert three-dimensional scene data into optical images and depth images from a top-down perspective.
在本步骤中,光学图像可以是灰度图像或彩色图像,计算设备可以通过获取三维场景数据中各位置正上方最高点的x轴、y轴坐标值及其像素值,从而将三维场景数据转化成俯视视角的光学图像;计算设备可以通过获取三维场景数据中各位置正上方最高点的x轴、y轴坐标值以及z轴的深度值,将三维场景数据转化成俯视视角的深度图像。In this step, the optical image can be a grayscale image or a color image. The computing device can convert the three-dimensional scene data by obtaining the x-axis, y-axis coordinate values and pixel values of the highest point directly above each position in the three-dimensional scene data. into an optical image from a top-down perspective; the computing device can convert the three-dimensional scene data into a depth image from a top-down perspective by obtaining the x-axis, y-axis coordinate values and the depth value of the z-axis of the highest point directly above each position in the three-dimensional scene data.
S130:对光学图像进行语义分割处理,得到目标物掩膜。S130: Perform semantic segmentation processing on the optical image to obtain the target object mask.
对于将三维地理场景中的建筑物单体化的方法而言,目标物掩膜可以是光学图像中与三维场景数据内建筑物相对应的区域,倘若是对其他三维场景中的物体进行单体化,目标物掩膜例如还可以是人、动物、植物等。For the method of singulating buildings in a three-dimensional geographical scene, the target object mask can be an area in the optical image corresponding to the building in the three-dimensional scene data. If it is to singulate the objects in other three-dimensional scenes, , the target mask can also be, for example, people, animals, plants, etc.
S140:对深度图像进行聚类分割处理,得到目标物簇。S140: Perform clustering and segmentation processing on the depth image to obtain target object clusters.
同样地,对于将三维地理场景中的建筑物单体化的方法而言,在本步骤中,目标物簇可以是深度图像中与三维场景数据内建筑物相对应的区域。Similarly, for the method of singulating buildings in a three-dimensional geographical scene, in this step, the target object cluster may be an area in the depth image corresponding to the building in the three-dimensional scene data.
需要说明的是,上述步骤S130和步骤S140的进行没有先后顺序,可以先进行步骤S130,也可以先进行步骤S140。It should be noted that there is no order in which the above steps S130 and S140 are performed. Step S130 may be performed first, or step S140 may be performed first.
S150:计算目标物掩膜和目标物簇之间交集面积与并集面积的比值。S150: Calculate the ratio of the intersection area and the union area between the target mask and the target cluster.
目标物掩膜和目标物簇之间的交集面积是指当光学图像与深度图像边缘对齐并重叠后,目标物掩膜与目标物簇之间交集区域的面积大小。目标物掩膜和目标物簇之间的并集面积是指当光学图像与深度图像边缘对齐并重叠后,目标物掩膜与目标物簇之间并集区域的面积大小。可以理解的是,目标物掩膜和目标物簇之间交集面积与并集面积的比值越大,则表示该目标物簇或该目标物掩膜在三维场景数据中所对应的区域为实际需要进行单体化的区域的可能性越大。The intersection area between the target mask and the target cluster refers to the area of the intersection area between the target mask and the target cluster when the edges of the optical image and the depth image are aligned and overlapped. The union area between the target mask and the target cluster refers to the size of the union area between the target mask and the target cluster when the edges of the optical image and the depth image are aligned and overlapped. It can be understood that the greater the ratio of the intersection area to the union area between the target mask and the target cluster, it means that the area corresponding to the target cluster or the target mask in the three-dimensional scene data is actually required. The more likely the area to be monomerized.
S160:确定大于或等于第一预设阈值的比值所对应的目标物簇或目标物掩膜,将确定的目标物簇或目标物掩膜在三维场景数据中所对应的区域确定为单体化目标物。S160: Determine the target cluster or target mask corresponding to the ratio greater than or equal to the first preset threshold, and determine the area corresponding to the determined target cluster or target mask in the three-dimensional scene data as a single unit Target.
目标物簇或目标物掩膜为俯视视角的光学图像或深度图像中的区域,将目标物簇或目标物掩膜在三维场景数据中所对应的区域确定为单体化目标物,则是指将目标物簇在深度图像中的区域或目标物掩膜在光学图像中的区域对应至三维场景数据中,将三维场景数据中与该区域所对应的整个正射投影方向(即z轴方向)的数据确定为单体化目标物。The target cluster or target mask is an area in an optical image or a depth image from a bird's-eye view. Determining the area corresponding to the target cluster or target mask in the three-dimensional scene data as a single target object means Correspond to the area where the target object is clustered in the depth image or the area where the target object is masked in the optical image to the three-dimensional scene data, and the entire orthographic projection direction corresponding to the area in the three-dimensional scene data (i.e., the z-axis direction) The data was determined to be the monomerized target.
在本步骤中,第一预设阈值可以根据实际需要进行单体化的目标物情况进行设定,例如当实际需要进行的单体化的目标物为建筑物时,可以在计算设备上将第一预设阈值设定为0.6,从而将交集面积和并集面积的比值小于0.6的可能不是实际需要进行的单体化的建筑物所对应区域的目标物簇和目标物掩膜排除掉,保留交集面积和并集面积的比值大于或等于0.6的目标物簇和目标物掩膜,并将该目标物簇或目标物掩膜在三维场景数据中所对应的区域确定为单体化建筑物。In this step, the first preset threshold can be set according to the actual situation of the target object that needs to be singulated. For example, when the target object that actually needs to be singulated is a building, the first preset threshold can be set on the computing device. A preset threshold is set to 0.6, thereby eliminating target clusters and target masks in areas corresponding to buildings whose intersection area and union area are less than 0.6 and may not be actually required to be singled, and retain them. Target clusters and target masks whose intersection area and union area ratio are greater than or equal to 0.6, and the area corresponding to the target cluster or target mask in the three-dimensional scene data is determined as a single building.
可以理解的是,当光学图像或深度图像中实际需要进行的单体化的目标物所在区域与其他区域之间的像素值或深度值差异较低时,得到的目标物掩膜和目标物簇可能与实际需要单体化的目标物所在区域的对应关系较差,此时可以将第一预设阈值的数值设定的大一些,从而过滤掉可能与实际需要单体化的目标物所在区域不对应的目标物掩膜和目标物簇,使得最终确定的单体化目标物更加准确。It can be understood that when the difference in pixel values or depth values between the area of the target that actually needs to be singulated and other areas in the optical image or depth image is low, the obtained target mask and target cluster The corresponding relationship with the area where the target object that actually needs to be singulated may be poor. In this case, the value of the first preset threshold can be set larger to filter out the area that may be related to the target object that actually needs to be singulated. Uncorresponding target masks and target clusters make the final single target more accurate.
考虑到得到的目标物簇与实际需要进行的单体化的目标物所在区域的对应关系相较于目标物掩膜更好,并且目标物簇的边缘相较于目标物掩膜更加平滑,因此优选将目标物簇在三维场景数据中所对应的区域确定为单体化目标物。Considering that the corresponding relationship between the obtained target cluster and the area of the target that actually needs to be singulated is better than that of the target mask, and the edges of the target cluster are smoother than that of the target mask, therefore Preferably, the area corresponding to the target object cluster in the three-dimensional scene data is determined as a single target object.
本发明提供的目标物单体化方法中,通过将获取的三维场景数据转换为二维俯视视角的光学图像和深度图像,可以有效减少需要处理的信息量,通过对光学图像和深度图像进行分割处理分别得到目标物掩膜和目标物簇,通过计算目标物掩膜和目标物簇之间交集面积与并集面积的比值,将在二维图像中对应的区域可能不是目标物的目标物簇和目标物掩膜(即小于第一预设阈值的比值所对应的目标物簇和目标物掩膜)排除,提高数据的准确性,最后将剩余的目标物簇或目标物掩膜在三维场景数据中所对应的区域确定为单体化目标物,实现对目标物的快速单体化处理。In the target object singulation method provided by the present invention, by converting the acquired three-dimensional scene data into a two-dimensional top-view optical image and a depth image, the amount of information that needs to be processed can be effectively reduced. By segmenting the optical image and the depth image The process obtains the target mask and the target cluster respectively. By calculating the ratio of the intersection area and the union area between the target mask and the target cluster, the corresponding area in the two-dimensional image may not be the target cluster. and target masks (that is, target clusters and target masks corresponding to ratios smaller than the first preset threshold) are excluded to improve the accuracy of the data, and finally the remaining target clusters or target masks are placed in the three-dimensional scene The corresponding area in the data is determined as a single target object, realizing rapid single processing of the target object.
相较于直接对三维点云数据进行单体化的方法而言,本发明提供的方法处理的数据量小,处理速度更快。本发明利用点云数据的深度信息,通过深度图像聚类分割的簇融合光学图像语义分割后的掩膜,使得分割区域完整,最终确定的单体化目标物的边缘精确,有效解决了分割区域不完整或超出真实目标物范围的问题。对于建筑物的单体化而言,本发明能够充分利用建筑物顶部与地面存在的较大距离以及屋顶接近平面的先验信息,准确地对深度图像进行聚类分割得到目标物簇,结合对光学图像语义分割得到的目标物掩膜,通过计算目标物掩膜和目标物簇之间交集面积与并集面积的比值,实现对目标物掩膜和目标物簇精确高效地过滤,可适应具有多种尺寸和不同建筑风格的地理区域,并且本发明提供的单体化方法所需的数据集容易获取和标注。Compared with the method of directly singulating three-dimensional point cloud data, the method provided by the present invention processes a smaller amount of data and has a faster processing speed. This invention uses the depth information of point cloud data to fuse the masks after semantic segmentation of optical images through depth image clustering and segmentation, so that the segmentation area is complete and the edges of the final single target objects are accurate, effectively solving the problem of segmentation areas. Incomplete or beyond the scope of the real target object. For the single unitization of buildings, the present invention can make full use of the large distance between the top of the building and the ground and the prior information that the roof is close to the plane, accurately cluster and segment the depth image to obtain the target object cluster, and combine the The target mask obtained by semantic segmentation of the optical image can accurately and efficiently filter the target mask and target cluster by calculating the ratio of the intersection area and the union area between the target mask and the target cluster, which can be adapted to the needs of Geographic areas of multiple sizes and different architectural styles, and the data sets required by the single method provided by the present invention are easy to obtain and label.
考虑到在很多时候,三维场景数据中的目标物可能不只有一个,基于这种情况,本发明进一步提出一种实施方式,具体地,目标物掩膜和目标物簇的数量均为多个,并且至少部分目标物掩膜与至少部分目标物簇之间具有一一对应关系。对应关系是指目标物掩膜与目标物簇在位置上的对应关系,例如在图像中的(x,y)坐标位置的对应关系。Considering that in many cases, there may be more than one target object in the three-dimensional scene data, based on this situation, the present invention further proposes an implementation. Specifically, the number of target object masks and target object clusters is multiple, And there is a one-to-one correspondence between at least part of the target mask and at least part of the target cluster. The correspondence relationship refers to the positional correspondence between the target mask and the target cluster, such as the correspondence between the (x, y) coordinate positions in the image.
上述步骤S150则包括:The above step S150 includes:
分别计算每个目标物簇和与其对应的目标物掩膜之间交集面积与并集面积的比值。The ratio of the intersection area and the union area between each target cluster and its corresponding target mask is calculated separately.
在本步骤中,若某个目标物簇没有与其对应的目标物掩膜,可以选择不对该目标物簇进行计算,当然也可以继续对该目标物簇进行计算,不过得出的比值为0。In this step, if a certain target cluster does not have a corresponding target mask, you can choose not to perform calculations on this target cluster. Of course, you can also continue to perform calculations on this target cluster, but the resulting ratio is 0.
可以理解的是,本步骤也可以替换为计算每个目标物掩膜和与其对应的目标物簇之间的交集面积与并集面积的比值,同理,若某个目标物掩膜没有与其对应的目标物簇,可以选择不对该目标物掩膜进行计算,当然也可以继续对该目标物掩膜进行计算,不过得出的比值为0。It can be understood that this step can also be replaced by calculating the ratio of the intersection area and the union area between each target mask and its corresponding target cluster. In the same way, if a certain target mask does not correspond to it For target clusters, you can choose not to calculate the target mask. Of course, you can also continue to calculate the target mask, but the resulting ratio is 0.
上述步骤S160则包括:The above step S160 includes:
确定每个大于或等于第一预设阈值的比值所对应的目标物簇或目标物掩膜,将确定的每个目标物簇或目标物掩膜在三维场景数据中所对应的区域分别确定为一个单体化目标物。Determine the target cluster or target mask corresponding to each ratio greater than or equal to the first preset threshold, and determine the area corresponding to each determined target cluster or target mask in the three-dimensional scene data as A single target.
通过分别计算每个目标物簇和与其对应的目标物掩膜之间交集面积与并集面积的比值并确定每个大于或等于第一预设阈值的比值所对应的目标物簇或目标物掩膜,使得当三维点云或三维网格模型中存在多个实际需要单体化的目标物时,计算设备可以将每个目标物分别进行单体化。By separately calculating the ratio of the intersection area and the union area between each target cluster and its corresponding target mask, and determining the target cluster or target mask corresponding to each ratio greater than or equal to the first preset threshold. Membrane, so that when there are multiple targets that actually need to be singulated in the three-dimensional point cloud or three-dimensional mesh model, the computing device can singulate each target separately.
为了减少噪声,本发明进一步提出一种实施方式,具体地,上述步骤S120之后,该方法还包括:In order to reduce noise, the present invention further proposes an implementation. Specifically, after the above step S120, the method further includes:
对光学图像和深度图像进行插值和/或滤波处理。Interpolate and/or filter optical and depth images.
其中,插值处理可以采用双线性插值法,具体过程如下:Among them, the interpolation process can use bilinear interpolation method. The specific process is as follows:
已知点Q 11=(x 1,y 1),Q 12=(x 2,y 2),Q 21=(x 2,y 1),Q 22=(x 2,y 2)的坐标值,想要获取点P=(x,y)的坐标值,则先在x方向线性插值: The coordinate values of the known points Q 11 =(x 1 ,y 1 ), Q 12 =(x 2 ,y 2 ), Q 21 =(x 2 ,y 1 ), Q 22 =(x 2 ,y 2 ), If you want to obtain the coordinate value of point P=(x,y), first linearly interpolate in the x direction:
然后在y方向线性插值,得到:Then linearly interpolate in the y direction to get:
从而确定P的坐标值。Thus the coordinate value of P is determined.
滤波处理可以采用中值滤波操作,具体过程如下:The filtering process can use median filtering operation. The specific process is as follows:
首先对光学图像或深度图像中的某个像素点,以该像素点为中心,取宽度为L的方形区域作为窗口,然后对该窗口内所有像素点的像素值(对于光学图像)或深度值(对于深度图像)进行排序,计算该窗口内所有像素点的像素值或深度值的中值,并用该中值取代该像素点的像素值或深度值。其中需要说明的是,当光学图像为彩色图像时,像素值为RGB值,当光学图像为灰度图像时,像素值为灰度值,下文中提到的像素值同理。First, for a certain pixel in the optical image or depth image, take the pixel as the center, take a square area with width L as the window, and then calculate the pixel values (for optical images) or depth values of all pixels in the window. (For depth images) Sort, calculate the median of the pixel values or depth values of all pixels in the window, and replace the pixel value or depth value of the pixel with the median value. It should be noted that when the optical image is a color image, the pixel value is an RGB value. When the optical image is a grayscale image, the pixel value is a grayscale value. The same applies to the pixel values mentioned below.
通过对光学图像或深度图像中的空洞点进行插值处理和/或滤波处理,可以有效减少噪声,有利于后续对光学图像或深度图像进行分割处理。By interpolating and/or filtering the hole points in the optical image or depth image, noise can be effectively reduced, which is beneficial to subsequent segmentation of the optical image or depth image.
对于上述步骤S130,本发明进一步提出一种具体实施方式,请参阅图2,图中示出了步骤S130的子步骤流程,如图中所示,步骤S130包括:For the above step S130, the present invention further proposes a specific implementation. Please refer to Figure 2, which shows the sub-step process of step S130. As shown in the figure, step S130 includes:
S131:将光学图像输入卷积神经网络。S131: Input the optical image into the convolutional neural network.
在本步骤中,将光学图像输入至已经训练好的卷积神经网络中。In this step, the optical image is input into the trained convolutional neural network.
S132:通过卷积神经网络输出光学图像中各像素点的分类置信度。S132: Output the classification confidence of each pixel in the optical image through the convolutional neural network.
在本步骤中,卷积神经网络对最后一层的输出计算softmax函数,并输出光学图像中各像素点分类结果的分类置信度。In this step, the convolutional neural network calculates the softmax function for the output of the last layer and outputs the classification confidence of the classification result of each pixel in the optical image.
S133:获取大于或等于第二预设阈值的分类置信度所对应的像素点,得到目标物掩膜。S133: Obtain the pixels corresponding to the classification confidence greater than or equal to the second preset threshold, and obtain the target object mask.
在本步骤中,将各像素点的分类置信度与第二预设阈值进行比较,保留大于或等于第二预设阈值的点,得到对应区域的目标物掩膜。对于建筑物单体化的方法而言,第二预设阈值可以设置为0.3,以保证得到的目标物掩膜的准确性。In this step, the classification confidence of each pixel is compared with the second preset threshold, and points greater than or equal to the second preset threshold are retained to obtain the target mask of the corresponding area. For the building singulation method, the second preset threshold can be set to 0.3 to ensure the accuracy of the obtained target object mask.
通过卷积神经网络确定光学图像中各点像素值的分类置信度,并通过获取大于或等于第二预设阈值的置信度所对应的点,得到目标物掩膜,准确地实现对光学图像中目标物所在区域的分割。The convolutional neural network is used to determine the classification confidence of each pixel value in the optical image, and by obtaining the points corresponding to the confidence greater than or equal to the second preset threshold, the target mask is obtained, and the target mask is accurately realized in the optical image. Segmentation of the target area.
对于上述步骤S140,本发明进一步提出一种具体实施方式,请参阅图3,图中示出了步骤S140的子步骤流程,如图中所示,步骤S140可以采用基于区域生长的聚类算法,具体包括:For the above step S140, the present invention further proposes a specific implementation. Please refer to Figure 3. The figure shows the sub-step process of step S140. As shown in the figure, step S140 can adopt a clustering algorithm based on region growing. Specifically include:
簇标记S141:将深度图像中的其中一个像素点划分至种子区域并标记为簇。Cluster marking S141: Divide one pixel in the depth image into a seed area and mark it as a cluster.
在本步骤中,可以通过计算设备手动选定一个点将其划分至种子区域并标记为簇,也可以由计算设备随机选定一个点将其划分至种子区域并标记为簇。In this step, a point can be manually selected by the computing device, divided into a seed area and marked as a cluster, or a point can be randomly selected by the computing device, divided into a seed area and marked as a cluster.
初次邻域像素归类S142:计算该像素点的深度值与其上、下、左、右四个相邻像素点的深度值之间的差的绝对值,将小于或等于第三预设阈值的差的绝对值所对应的相邻像素点划分至种子区域并标记为簇。Initial neighborhood pixel classification S142: Calculate the absolute value of the difference between the depth value of the pixel and the depth values of its four adjacent pixels above, below, left, and right, which will be less than or equal to the third preset threshold. The adjacent pixels corresponding to the absolute value of the difference are divided into seed regions and marked as clusters.
在深度图像中由于目标物所在区域的深度值与其他区域的深度值之间存在一定差异,对于目标物为建筑物时,这一差异尤为明显。因此若该像素点位于需要单体化的目标物所在的区域内,则与该像素点深度值的差的绝对值小于或等于第三预设阈值的相邻像素点极大概率也位于需要单体化的目标物所在的区域;若该像素点不在需要单体化的目标物所在的区域内,则与该像素点深度值的差的绝对值小于或等于第三预设阈值的相邻像素点极大概率也不在需要单体化的目标物所在的区域内。因此将该像素点和与其深度值的差的绝对值小于或等于第三预设阈值的相邻像素点标记为同一簇,可以将深度图像中目标物所在的区域归至一个或几个簇当中,将其他区域归至一个或几个簇当中。In the depth image, there is a certain difference between the depth value of the area where the target object is located and the depth values of other areas. This difference is particularly obvious when the target object is a building. Therefore, if the pixel is located in the area where the target object that needs to be singulated is located, then the adjacent pixel whose absolute value of the difference from the depth value of the pixel is less than or equal to the third preset threshold has a maximum probability that it is also located in the area where the target object needs to be singulated. The area where the target object that needs to be singled out is located; if the pixel is not in the area where the target object that needs to be singled out is located, then the absolute value of the difference from the depth value of the pixel is less than or equal to the third preset threshold for adjacent pixels The point maximum probability is also not within the area where the target that needs to be isolated is located. Therefore, the pixel and the adjacent pixels whose absolute value of the difference between its depth value is less than or equal to the third preset threshold are marked as the same cluster, and the area where the target object in the depth image is located can be classified into one or several clusters. , classify other areas into one or several clusters.
对于建筑物的单体化而言,第三预设阈值可以设定为10,以保证像素归类及标记的准确性。For the individualization of buildings, the third preset threshold can be set to 10 to ensure the accuracy of pixel classification and labeling.
重复邻域像素归类S143:对种子区域内除该像素点之外的其他像素点分别进行上述步骤S142,直至种子区域内部边缘所有的像素点的深度值和与其相邻且位于种子区域外部的像素点的深度值之间的差的绝对值均大于第三预设阈值后停止,种子区域内所有的像素点被标记为同一个簇。Repeat neighborhood pixel classification S143: perform the above step S142 for other pixels in the seed area except the pixel, until the depth values of all pixels on the inner edge of the seed area are summed with those adjacent to them and located outside the seed area. The method stops when the absolute value of the difference between the depth values of the pixels is greater than the third preset threshold, and all pixels in the seed area are marked as the same cluster.
需要注意的是,在实际情况当中,本步骤中可能不会进行,可能仅进行一次,也可能会循环进行。具体地,当步骤S142中该像素点的深度值与其所有相邻像素点的深度值之间的差的绝对值均大于第三预设阈值时,则没有相邻像素点被划分至种子区域,并且该像素点被单独标记为一个簇,这种情况下则跳过本步骤,直接进行后续步骤。当步骤S142中有至少一个相邻像素点被划分至种子区域时,则将这至少一个相邻像素点作为第一相邻像素点并分别进行步骤S142,计算这至少一个第一相邻像素点和与其相邻的第二相邻像素点之间深度值的差的绝对值,当第一相邻像素点和与其相邻的第二相邻像素点之间深度值的差的绝对值均大于第三预设阈值时,则该像素点和这至少一个第一相邻像素点被标记为同一个簇,此时本步骤停止,这一情况中本步骤仅进行了一次。在上述仅进行一次的情况中,当有至少一个第二相邻像素点被划分至种子区域时,则对着至少一个第二相邻像素点继续进行步骤S142,计算这至少一个第二相邻像素点和与其相邻的第二相邻像素点之间深度值的差的绝对值,以此循环,直至种子区域内部边缘所有像素点的深度值和与其相邻且位于种子区域外部的像素点的深度值之间的差的绝对值均大于第三预设阈值后停止,此时种子区域内所有的像素点被标记为同一个簇。It should be noted that in actual situations, this step may not be performed, it may be performed only once, or it may be performed in a loop. Specifically, when the absolute value of the difference between the depth value of the pixel and the depth values of all adjacent pixels in step S142 is greater than the third preset threshold, then no adjacent pixel is divided into the seed area, And the pixel is individually marked as a cluster. In this case, this step is skipped and the subsequent steps are performed directly. When at least one adjacent pixel is divided into the seed area in step S142, the at least one adjacent pixel is regarded as the first adjacent pixel and step S142 is performed respectively to calculate the at least one first adjacent pixel. and the absolute value of the difference in depth value between the first adjacent pixel and the second adjacent pixel, when the absolute value of the difference in depth between the first adjacent pixel and the second adjacent pixel is greater than When the third preset threshold is reached, the pixel and the at least one first adjacent pixel are marked as the same cluster, and this step stops. In this case, this step is only performed once. In the above situation where it is only performed once, when at least one second adjacent pixel point is divided into the seed area, step S142 is continued for the at least one second adjacent pixel point to calculate the at least one second adjacent pixel point. The absolute value of the difference in depth value between a pixel and its second adjacent pixel, and this cycle continues until the depth values of all pixels on the inner edge of the seed area and the depth values of pixels adjacent to it and located outside the seed area The absolute value of the difference between the depth values is greater than the third preset threshold and stops. At this time, all pixels in the seed area are marked as the same cluster.
遍历剩余像素S144:重复对深度图像中没有被标记为簇的像素点依次进行上述步骤S141、S142和S143,直至深度图像中所有的像素点均被标记为簇后停止,得到初始目标物簇。Traverse the remaining pixels S144: Repeat the above steps S141, S142 and S143 for the pixels in the depth image that are not marked as clusters in sequence until all the pixels in the depth image are marked as clusters and then stop to obtain the initial target cluster.
需要注意的是,在本步骤中,每一次重复对深度图像中没有被标记为簇的像素点进行上述步骤S141时,该像素点都是被标记为一个新的簇,从而在深度图像中所有的像素点均被标记为簇时,得到的初始目标物簇的数量为多个,这些不同的初始目标物簇中既有需要进行单体化的目标物簇,也有不需要进行单体化的背景簇。It should be noted that in this step, every time the above step S141 is repeated for a pixel point in the depth image that is not marked as a cluster, the pixel point is marked as a new cluster, so that all pixel points in the depth image are marked as a new cluster. When all the pixels are marked as clusters, the number of initial target clusters obtained is multiple. Among these different initial target clusters, there are target clusters that need to be singulated, and some that do not need to be singulated. Background cluster.
在一种可选的方式中,优先对种子区域外边缘的像素点进行上述步骤S144。通过优先对种子区域外边缘的像素点进行上述步骤S144,使得在将深度图像中的像素点标记为簇的过程中,是以第一个标记的像素点为中心并逐步向外扩散,有利于提升对深度图像聚类的效率。In an optional manner, the above step S144 is performed preferentially on the pixels at the outer edge of the seed area. By preferentially performing the above step S144 on the pixels at the outer edge of the seed area, the process of marking the pixels in the depth image as clusters is centered on the first marked pixel and gradually spreads outward, which is beneficial to Improve the efficiency of depth image clustering.
通过基于区域生长的聚类算法对深度图像进行分割,将目标物对应的区域和背景对应的区域均标记为簇,得到初始目标物簇,实现对目标物区域以及非目标物区域的精确划分。The depth image is segmented through a clustering algorithm based on region growing, and the areas corresponding to the target and the areas corresponding to the background are marked as clusters to obtain the initial target cluster and achieve accurate division of target areas and non-target areas.
进一步地,请继续参阅图3,上述步骤140还包括:Further, please continue to refer to Figure 3. The above step 140 also includes:
簇过滤S145:获取面积大于或等于第四预设阈值的初始目标物簇,得到目标物簇。Cluster filtering S145: Obtain initial target clusters with an area greater than or equal to the fourth preset threshold, and obtain target clusters.
对于目标物为建筑物时,由于建筑物屋顶大致为平面并且屋顶具有一定面积,因此一个建筑物所在区域会被分割在一个较大面积的簇中,而建筑物以外的背景区域由于深度值参差不齐,因此会被分割为多个小面积的簇,基于此,通过获取面积大于或等于第四预设阈值的初始目标物簇便可以较准确地找到建筑物所在区域对应的簇,过滤掉背景区域所对应的簇。第四预设阈值一般可以设定为300,以能够较可靠地过滤背景区域对应的簇。When the target object is a building, since the roof of the building is roughly flat and has a certain area, the area where the building is located will be divided into a larger cluster, while the background area outside the building has varying depth values. are uneven, so they will be divided into multiple small-area clusters. Based on this, by obtaining the initial target clusters with an area greater than or equal to the fourth preset threshold, the cluster corresponding to the area where the building is located can be found more accurately, and filtered out The cluster corresponding to the background area. The fourth preset threshold can generally be set to 300 to more reliably filter clusters corresponding to the background area.
根据本发明实施例的另一个方面,提供一种目标物单体化装置,具体请参阅图4,图中示出了一实施例提供的目标物单体化装置的结构。如图中所示,目标物单体化装置200包括获取单元210、图像转换单元220、第一图像分割单元230、第二图像分割单元240、计算单元250和确定单元260。获取单元210用于获取三维场景数据。图像转换单元220用于将三维场景数据转换为俯视视角的光学图像和深度图像。第一图像分割单元230用于对光学图像进行语义分割处理,得到目标物掩膜。第二图像分割单元240用于对深度图像进行聚类分割处理,得到目标物簇。计算单元250用于计算目标物掩膜和目标物簇之间交集面积与并集面积的比值。确定单元260用于确定大于或等于第一预设阈值的比值所对应的目标物簇或目标物掩膜,将确定的目标物簇或目标物掩膜在三维场景数据中所对应的区域确定为单体化目标物。According to another aspect of the embodiment of the present invention, a target singulation device is provided. For details, please refer to FIG. 4 , which shows the structure of a target singulation device according to an embodiment. As shown in the figure, the object singulation device 200 includes an acquisition unit 210, an image conversion unit 220, a first image segmentation unit 230, a second image segmentation unit 240, a calculation unit 250 and a determination unit 260. The acquisition unit 210 is used to acquire three-dimensional scene data. The image conversion unit 220 is used to convert the three-dimensional scene data into an optical image and a depth image from a bird's-eye view. The first image segmentation unit 230 is used to perform semantic segmentation processing on the optical image to obtain a target mask. The second image segmentation unit 240 is used to perform cluster segmentation processing on the depth image to obtain target object clusters. The calculation unit 250 is used to calculate the ratio of the intersection area and the union area between the target mask and the target cluster. The determination unit 260 is used to determine the target cluster or target mask corresponding to a ratio greater than or equal to the first preset threshold, and determine the area corresponding to the determined target cluster or target mask in the three-dimensional scene data as Single target.
在一种可选的方式中,目标物掩膜和目标物簇的数量均为多个,并且至少部分目标物掩膜与至少部分目标物簇之间具有一一对应关系。计算单元250用于分别计算每个目标物簇和与其对应的目标物掩膜之间交集面积与并集面积的比值。确定单元260用于确定每个大于或等于第一预设阈值的比值所对应的目标物簇或目标物掩膜,将确定的每个目标物簇或目标物掩膜在三维场景数据中所对应的区域分别确定为一个单体化目标物。In an optional manner, there are multiple target object masks and target object clusters, and there is a one-to-one correspondence between at least some of the target object masks and at least some of the target object clusters. The calculation unit 250 is used to respectively calculate the ratio of the intersection area and the union area between each target cluster and its corresponding target mask. The determination unit 260 is used to determine the target object cluster or target object mask corresponding to each ratio greater than or equal to the first preset threshold, and assign each determined target object cluster or target object mask to the corresponding target object mask in the three-dimensional scene data. The regions are respectively determined as a single target.
请再次参阅图4,在一种可选的方式中,目标物单体化装置200还包括降噪单元270,降噪单元270用于对光学图像和深度图像进行插值处理和/或滤波处理。Please refer to FIG. 4 again. In an optional manner, the object singulation device 200 also includes a noise reduction unit 270. The noise reduction unit 270 is used to perform interpolation processing and/or filtering processing on the optical image and the depth image.
在一种可选的方式中,第一图像分割单元230用于将光学图像输入卷积神经网络,用于通过卷积神经网络输出光学图像中各像素点的分类置信度,用于获取大于或等于第二预设阈值的分类置信度所对应的像素点,得到目标物掩膜。In an optional manner, the first image segmentation unit 230 is used to input the optical image into the convolutional neural network, to output the classification confidence of each pixel in the optical image through the convolutional neural network, and to obtain a value greater than or equal to The target object mask is obtained from the pixels corresponding to the classification confidence level equal to the second preset threshold.
在一种可选的方式中,第二图像分割单元240用于簇标记:将深度图像中的其中一个像素点划分至种子区域并标记为簇,用于初次邻域像素归类:计算该像素点的深度值与其上、下、左、右四个相邻像素点的深度值之间的差的绝对值,将小于或等于第三预设阈值的差的绝对值所对应的相邻像素点划分至种子区域并标记为簇,用于重复邻域像素归类:对种子区域内除该像素点之外的其他像素点分别进行上述初次邻域像素归类,直至种子区域内部边缘所有的像素点的深度值和与其相邻且位于种子区域外部的像素点的深度值之间的差的绝对值均大于第三预设阈值后停止,种子区域内所有的像素点被标记为同一个簇,用于遍历剩余像素:重复对深度图像中没有被标记为簇的像素点依次进行上述簇标记、初次邻域像素归类和重复邻域像素归类,直至深度图像中所有的像素点均被标记为簇后停止,得到初始目标物簇。In an optional manner, the second image segmentation unit 240 is used for cluster labeling: dividing one of the pixels in the depth image into a seed area and labeling it as a cluster for initial neighborhood pixel classification: calculating the pixel The absolute value of the difference between the depth value of a point and the depth values of its four adjacent pixels above, below, left and right will be less than or equal to the adjacent pixel corresponding to the absolute value of the difference between the third preset threshold. Divide it into the seed area and mark it as a cluster for repeated neighborhood pixel classification: perform the above-mentioned initial neighborhood pixel classification on other pixels in the seed area except this pixel, until all pixels on the inner edge of the seed area are The absolute value of the difference between the depth value of the point and the depth value of the adjacent pixel outside the seed area is greater than the third preset threshold and then stops. All pixels in the seed area are marked as the same cluster. Used to traverse the remaining pixels: Repeat the above-mentioned cluster marking, initial neighborhood pixel classification and repeated neighborhood pixel classification for pixels in the depth image that are not marked as clusters, until all pixels in the depth image are marked. Stop after clustering and obtain the initial target cluster.
在一种可选的方式中,第二图像分割单元240优先对种子区域外边缘的像素点进行重复邻域像素归类。In an optional manner, the second image segmentation unit 240 prioritizes the pixels at the outer edge of the seed area for repeated neighborhood pixel classification.
在一种可选的方式中,第二图像分割单元240还用于簇过滤:获取面积大于或等于第四预设阈值的初始目标物簇,得到目标物簇。In an optional manner, the second image segmentation unit 240 is also used for cluster filtering: obtaining initial target object clusters with an area greater than or equal to the fourth preset threshold to obtain target object clusters.
根据本发明实施例的另一个方面,还提供一种计算设备,具体请参阅图5,图中示出了一实施例提供的计算设备的结构,本发明具体实施例并不对计算设备的具体实现做限定。According to another aspect of the embodiment of the present invention, a computing device is also provided. Please refer to FIG. 5 for details. The figure shows the structure of the computing device provided by an embodiment. The specific embodiment of the present invention does not describe the specific implementation of the computing device. Make limitations.
如图5所示,该计算设备可以包括:处理器(processor)402、通信接口(Communications Interface)404、存储器(memory)406、以及通信总线408。As shown in Figure 5, the computing device may include: a processor (processor) 402, a communications interface (Communications Interface) 404, a memory (memory) 406, and a communication bus 408.
其中:处理器402、通信接口404、以及存储器406通过通信总线408完成相互间的通信。通信接口404,用于与其它设备比如客户端或其它服务器等的网元通信。处理器402,用于执行程序410,具体可以执行上述用于目标物单体化方法实施例中的相关步骤。Among them: the processor 402, the communication interface 404, and the memory 406 complete communication with each other through the communication bus 408. The communication interface 404 is used to communicate with network elements of other devices such as clients or other servers. The processor 402 is configured to execute the program 410. Specifically, the processor 402 can execute the above-mentioned related steps in the target object singulation method embodiment.
具体地,程序410可以包括程序代码,该程序代码包括计算机可执行指令。Specifically, program 410 may include program code including computer-executable instructions.
处理器402可能是中央处理器CPU,或者是特定集成电路ASIC(Application Specific Integrated Circuit),或者是被配置成实施本发明实施例的一个或多个集成电路。计算设备包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个ASIC。The processor 402 may be a central processing unit (CPU), an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention. The one or more processors included in the computing device may be the same type of processor, such as one or more CPUs; or they may be different types of processors, such as one or more CPUs and one or more ASICs.
存储器406,用于存放程序410。存储器406可能包含高速RAM存储器,也可能还包括非易失性存储器(non‑volatile memory),例如至少一个磁盘存储器。Memory 406 is used to store programs 410. Memory 406 may include high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
程序410具体可以被处理器402调用使计算设备执行以下操作:Program 410 can specifically be called by processor 402 to cause the computing device to perform the following operations:
获取三维场景数据;Obtain 3D scene data;
将三维场景数据转换为俯视视角的光学图像和深度图像;Convert three-dimensional scene data into optical images and depth images from a top-down perspective;
对光学图像进行语义分割处理,得到目标物掩膜;Perform semantic segmentation processing on the optical image to obtain the target mask;
对深度图像进行聚类分割处理,得到目标物簇;Perform clustering and segmentation processing on depth images to obtain target object clusters;
计算目标物掩膜和目标物簇之间交集面积与并集面积的比值;Calculate the ratio of the intersection area and the union area between the target mask and the target cluster;
确定大于或等于第一预设阈值的比值所对应的目标物簇或目标物掩膜,将确定的目标物簇或目标物掩膜在三维场景数据中所对应的区域确定为单体化目标物。Determine the target cluster or target mask corresponding to the ratio greater than or equal to the first preset threshold, and determine the area corresponding to the determined target cluster or target mask in the three-dimensional scene data as a single target object .
根据本发明实施例的另一个方面,还提供了一种计算机可读存储介质,所述存储介质存储有至少一可执行指令,该可执行指令在计算设备上运行时,使得所述计算设备执行上述任意方法实施例中的目标物单体化方法。According to another aspect of the embodiment of the present invention, a computer-readable storage medium is also provided. The storage medium stores at least one executable instruction. When the executable instruction is run on a computing device, the computing device causes the computing device to execute The target monomerization method in any of the above method embodiments.
可执行指令具体可以用于使得计算设备执行以下操作:Specifically, the executable instructions may be used to cause the computing device to perform the following operations:
获取三维场景数据;Obtain 3D scene data;
将三维场景数据转换为俯视视角的光学图像和深度图像;Convert three-dimensional scene data into optical images and depth images from a top-down perspective;
对光学图像进行语义分割处理,得到目标物掩膜;Perform semantic segmentation processing on the optical image to obtain the target mask;
对深度图像进行聚类分割处理,得到目标物簇;Perform clustering and segmentation processing on depth images to obtain target object clusters;
计算目标物掩膜和目标物簇之间交集面积与并集面积的比值;Calculate the ratio of the intersection area and the union area between the target mask and the target cluster;
确定大于或等于第一预设阈值的比值所对应的目标物簇或目标物掩膜,将确定的目标物簇或目标物掩膜在三维场景数据中所对应的区域确定为单体化目标物。Determine the target cluster or target mask corresponding to the ratio greater than or equal to the first preset threshold, and determine the area corresponding to the determined target cluster or target mask in the three-dimensional scene data as a single target object .
在此提供的算法或显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述,构造这类系统所要求的结构是显而易见的。此外,本发明实施例也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本发明的内容,并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。The algorithms or displays provided herein are not inherently associated with any particular computer, virtual system, or other device. Various general-purpose systems can also be used with teaching based on this. From the above description, the structure required to construct such a system is obvious. Furthermore, embodiments of the present invention are not directed to any specific programming language. It should be understood that a variety of programming languages may be utilized to implement the invention described herein, and that the above descriptions of specific languages are intended to disclose the best mode of carrying out the invention.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the instructions provided here, a number of specific details are described. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures, and techniques have not been shown in detail so as not to obscure the understanding of this description.
类似地,应当理解,为了精简本发明并帮助理解各个发明方面中的一个或多个,在上面对本发明的示例性实施例的描述中,本发明实施例的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。Similarly, it will be understood that in the above description of exemplary embodiments of the invention, various features of embodiments of the invention are sometimes grouped together into a single implementation in order to streamline the invention and assist in understanding one or more of the various inventive aspects. examples, diagrams, or descriptions thereof. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim.
本领域技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art will understand that modules in the devices in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. The modules or units or components in the embodiments may be combined into one module or unit or component, and they may be divided into multiple sub-modules or sub-units or sub-components. All features disclosed in this specification (including the accompanying claims, abstract and drawings) and any method so disclosed may be employed in any combination, except that at least some of such features and/or processes or units are mutually exclusive. All processes or units of the equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。上述实施例中的步骤,除有特殊说明外,不应理解为对执行顺序的限定。It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In the element claim enumerating several means, several of these means may be embodied by the same item of hardware. The use of the words first, second, third, etc. does not indicate any order. These words can be interpreted as names. Unless otherwise specified, the steps in the above embodiments should not be understood as limiting the order of execution.

Claims (10)

  1. 一种目标物单体化方法,其特征在于,包括:A target monomerization method, characterized by including:
    获取三维场景数据;Obtain 3D scene data;
    将所述三维场景数据转换为俯视视角的光学图像和深度图像;Convert the three-dimensional scene data into an optical image and a depth image from a top-down perspective;
    对所述光学图像进行语义分割处理,得到目标物掩膜;Perform semantic segmentation processing on the optical image to obtain a target mask;
    对所述深度图像进行聚类分割处理,得到目标物簇;Perform clustering and segmentation processing on the depth image to obtain target object clusters;
    计算所述目标物掩膜和所述目标物簇之间交集面积与并集面积的比值;Calculate the ratio of the intersection area and the union area between the target mask and the target cluster;
    确定大于或等于第一预设阈值的所述比值所对应的所述目标物簇或所述目标物掩膜,将确定的所述目标物簇或所述目标物掩膜在所述三维场景数据中所对应的区域确定为单体化目标物。Determine the target object cluster or the target object mask corresponding to the ratio greater than or equal to the first preset threshold, and use the determined target object cluster or the target object mask in the three-dimensional scene data The corresponding area in is determined as the monomerized target.
  2. 根据权利要求1所述的目标物单体化方法,其特征在于,所述目标物掩膜和所述目标物簇的数量均为多个,并且至少部分所述目标物掩膜与至少部分所述目标物簇之间具有一一对应关系;The target singulation method according to claim 1, characterized in that the number of the target mask and the target cluster is multiple, and at least part of the target mask is connected to at least part of the target mask. There is a one-to-one correspondence between the target object clusters;
    所述计算所述目标物掩膜和所述目标物簇之间交集面积与并集面积的比值,包括:The calculation of the ratio of the intersection area and the union area between the target mask and the target cluster includes:
    分别计算每个所述目标物簇和与其对应的所述目标物掩膜之间交集面积与并集面积的比值;Calculate respectively the ratio of the intersection area and the union area between each target cluster and the corresponding target mask;
    所述确定大于或等于第一预设阈值的所述比值所对应的所述目标物簇或所述目标物掩膜,将确定的所述目标物簇或所述目标物掩膜在所述三维场景数据中所对应的区域确定为单体化目标物,包括:Determine the target object cluster or the target object mask corresponding to the ratio that is greater than or equal to the first preset threshold, and place the determined target object cluster or the target object mask in the three-dimensional The corresponding area in the scene data is determined as a single target, including:
    确定每个大于或等于所述第一预设阈值的所述比值所对应的所述目标物簇或所述目标物掩膜,将确定的每个所述目标物簇或所述目标物掩膜在所述三维场景数据中所对应的区域分别确定为一个所述单体化目标物。Determine the target object cluster or the target object mask corresponding to each of the ratio values greater than or equal to the first preset threshold, and use each of the determined target object clusters or the target object mask to The corresponding area in the three-dimensional scene data is determined as one of the individualized targets.
  3. 根据权利要求1所述的目标物单体化方法,其特征在于,所述将所述三维场景数据转换为俯视视角的光学图像和深度图像之后,所述方法还包括:The target object singulation method according to claim 1, characterized in that after converting the three-dimensional scene data into an optical image and a depth image from a top view, the method further includes:
    对所述光学图像和所述深度图像进行插值处理和/或滤波处理。Interpolation processing and/or filtering processing is performed on the optical image and the depth image.
  4. 根据权利要求1所述的目标物单体化方法,其特征在于,所述对所述光学图像进行语义分割处理,得到目标物掩膜,包括:The target object singulation method according to claim 1, characterized in that, performing semantic segmentation processing on the optical image to obtain the target object mask includes:
    将所述光学图像输入卷积神经网络;Input the optical image into a convolutional neural network;
    通过所述卷积神经网络输出所述光学图像中各像素点的分类置信度;Output the classification confidence of each pixel in the optical image through the convolutional neural network;
    获取大于或等于第二预设阈值的所述分类置信度所对应的像素点,得到所述目标物掩膜。The pixels corresponding to the classification confidence greater than or equal to the second preset threshold are obtained to obtain the target mask.
  5. 根据权利要求1所述的目标物单体化方法,其特征在于,所述对所述深度图像进行聚类分割处理,得到目标物簇,包括:The target object singulation method according to claim 1, characterized in that, performing clustering and segmentation processing on the depth image to obtain target object clusters includes:
    簇标记:将所述深度图像中的其中一个像素点划分至种子区域并标记为簇;Cluster labeling: divide one of the pixels in the depth image into a seed area and label it as a cluster;
    初次邻域像素归类:计算所述像素点的深度值与其上、下、左、右四个相邻像素点的深度值之间的差的绝对值,将小于或等于第三预设阈值的所述差的绝对值所对应的所述相邻像素点划分至所述种子区域并标记为所述簇;Initial neighborhood pixel classification: Calculate the absolute value of the difference between the depth value of the pixel and the depth values of its four adjacent pixels above, below, left, and right. The absolute value will be less than or equal to the third preset threshold. The adjacent pixel points corresponding to the absolute value of the difference are divided into the seed area and marked as the cluster;
    重复邻域像素归类:对所述种子区域内除所述像素点之外的其他像素点分别进行所述邻域像素聚类,直至所述种子区域内部边缘所有像素点的深度值和与其相邻且位于所述种子区域外部的像素点的深度值之间的所述差的绝对值均大于所述第三预设阈值后停止,所述种子区域内的所有像素点被标记为同一个所述簇;Repeat the neighborhood pixel classification: perform the neighborhood pixel clustering on other pixels in the seed area except the pixel, until the sum of the depth values of all pixels on the inner edge of the seed area is consistent with it. Stop after the absolute value of the difference between the depth values of pixels adjacent to and located outside the seed area is greater than the third preset threshold, and all pixels in the seed area are marked as the same one. statement cluster;
    遍历剩余像素:重复对所述深度图像中没有被标记为所述簇的像素点依次进行所述簇标记、所述初次邻域像素归类和所述重复邻域像素归类,直至所述深度图像中的所有像素点均被标记后停止,得到初始目标物簇。Traverse the remaining pixels: Repeat the cluster marking, the initial neighborhood pixel classification and the repeated neighborhood pixel classification for pixels in the depth image that are not marked as the cluster in sequence until the depth All pixels in the image are marked and then stopped to obtain the initial target cluster.
  6. 根据权利要求5所述的目标物单体化方法,其特征在于,优先对所述种子区域外边缘的像素点进行所述重复邻域像素归类。The object singulation method according to claim 5, characterized in that the repeated neighborhood pixel classification is performed on the pixels at the outer edge of the seed area with priority.
  7. 根据权利要求5所述的目标物单体化方法,其特征在于,所述重复邻域像素归类之后,所述方法还包括:The target object singulation method according to claim 5, characterized in that after the repeated neighborhood pixel classification, the method further includes:
    簇过滤:获取面积大于或等于第四预设阈值的所述初始目标物簇,得到所述目标物簇。Cluster filtering: obtain the initial target clusters whose area is greater than or equal to the fourth preset threshold to obtain the target clusters.
  8. 一种目标物单体化装置,其特征在于,包括:A target monomerization device, characterized in that it includes:
    获取单元,用于获取三维场景数据;Acquisition unit, used to obtain three-dimensional scene data;
    图像转换单元,用于将所述三维场景数据转换为俯视视角的光学图像和深度图像;An image conversion unit, configured to convert the three-dimensional scene data into an optical image and a depth image from a bird's-eye view;
    第一图像分割单元,用于对所述光学图像进行语义分割处理,得到目标物的图像掩膜;The first image segmentation unit is used to perform semantic segmentation processing on the optical image to obtain an image mask of the target object;
    第二图像分割单元,用于对所述深度图像进行聚类分割处理,得到目标物簇;The second image segmentation unit is used to perform clustering and segmentation processing on the depth image to obtain target object clusters;
    计算单元,用于计算所述目标物掩膜和所述目标物簇之间交集面积与并集面积的比值;A calculation unit configured to calculate the ratio of the intersection area and the union area between the target mask and the target cluster;
    确定单元,用于确定大于或等于第一预设阈值的所述比值所对应的所述目标物簇或所述目标物掩膜,将确定的所述目标物簇或所述目标物掩膜在所述三维场景数据中所对应的区域确定为单体化目标物。A determining unit, configured to determine the target cluster or target mask corresponding to the ratio greater than or equal to a first preset threshold, and place the determined target cluster or target mask in The corresponding area in the three-dimensional scene data is determined to be a single target object.
  9. 一种计算设备,其特征在于,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;A computing device, characterized in that it includes: a processor, a memory, a communication interface and a communication bus, and the processor, the memory and the communication interface complete communication with each other through the communication bus;
    所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如权利要求1‑7中任一项所述的目标物单体化方法。The memory is used to store at least one executable instruction, and the executable instruction causes the processor to execute the target object singulation method according to any one of claims 1-7.
  10. 一种计算机存储介质,其特征在于,所述存储介质中存储有至少一可执行指令,所述可执行指令使处理器执行如权利要求1‑7中任一项所述的目标物单体化方法。A computer storage medium, characterized in that at least one executable instruction is stored in the storage medium, and the executable instruction causes the processor to execute the object singulation as described in any one of claims 1-7. method.
PCT/CN2023/089948 2022-05-23 2023-04-21 Target object separation method and apparatus, device, and storage medium WO2023226654A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210560168.2 2022-05-23
CN202210560168.2A CN114648640B (en) 2022-05-23 2022-05-23 Target object monomer method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2023226654A1 true WO2023226654A1 (en) 2023-11-30

Family

ID=81997653

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/089948 WO2023226654A1 (en) 2022-05-23 2023-04-21 Target object separation method and apparatus, device, and storage medium

Country Status (2)

Country Link
CN (1) CN114648640B (en)
WO (1) WO2023226654A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118015197A (en) * 2024-04-08 2024-05-10 北京师范大学珠海校区 Live-action three-dimensional logic singulation method and device and electronic equipment

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114648640B (en) * 2022-05-23 2022-09-06 深圳市其域创新科技有限公司 Target object monomer method, device, equipment and storage medium
CN115187619A (en) * 2022-09-13 2022-10-14 深圳市其域创新科技有限公司 Mesh data segmentation method, device, equipment and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108470339A (en) * 2018-03-21 2018-08-31 华南理工大学 A kind of visual identity of overlapping apple and localization method based on information fusion
CN112132845A (en) * 2020-08-13 2020-12-25 当家移动绿色互联网技术集团有限公司 Three-dimensional model unitization method and device, electronic equipment and readable medium
CN112967301A (en) * 2021-04-08 2021-06-15 北京华捷艾米科技有限公司 Self-timer image matting method and device
CN114648640A (en) * 2022-05-23 2022-06-21 深圳市其域创新科技有限公司 Target object monomer method, device, equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8514225B2 (en) * 2011-01-07 2013-08-20 Sony Computer Entertainment America Llc Scaling pixel depth values of user-controlled virtual object in three-dimensional scene
CN113379826A (en) * 2020-03-10 2021-09-10 顺丰科技有限公司 Method and device for measuring volume of logistics piece

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108470339A (en) * 2018-03-21 2018-08-31 华南理工大学 A kind of visual identity of overlapping apple and localization method based on information fusion
CN112132845A (en) * 2020-08-13 2020-12-25 当家移动绿色互联网技术集团有限公司 Three-dimensional model unitization method and device, electronic equipment and readable medium
CN112967301A (en) * 2021-04-08 2021-06-15 北京华捷艾米科技有限公司 Self-timer image matting method and device
CN114648640A (en) * 2022-05-23 2022-06-21 深圳市其域创新科技有限公司 Target object monomer method, device, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118015197A (en) * 2024-04-08 2024-05-10 北京师范大学珠海校区 Live-action three-dimensional logic singulation method and device and electronic equipment

Also Published As

Publication number Publication date
CN114648640A (en) 2022-06-21
CN114648640B (en) 2022-09-06

Similar Documents

Publication Publication Date Title
WO2023226654A1 (en) Target object separation method and apparatus, device, and storage medium
CN109493407B (en) Method and device for realizing laser point cloud densification and computer equipment
CN111832655B (en) Multi-scale three-dimensional target detection method based on characteristic pyramid network
CN111640125B (en) Aerial photography graph building detection and segmentation method and device based on Mask R-CNN
CN111242041B (en) Laser radar three-dimensional target rapid detection method based on pseudo-image technology
CN109325484B (en) Flower image classification method based on background prior significance
JP2021532442A (en) Target detection method and device, smart operation method, device and storage medium
CN111145174A (en) 3D target detection method for point cloud screening based on image semantic features
CN104134234A (en) Full-automatic three-dimensional scene construction method based on single image
WO2023193401A1 (en) Point cloud detection model training method and apparatus, electronic device, and storage medium
WO2022041437A1 (en) Plant model generating method and apparatus, computer equipment and storage medium
WO2022017131A1 (en) Point cloud data processing method and device, and intelligent driving control method and device
CN110176064B (en) Automatic identification method for main body object of photogrammetric generation three-dimensional model
CN113761999A (en) Target detection method and device, electronic equipment and storage medium
CN113192200B (en) Method for constructing urban real scene three-dimensional model based on space-three parallel computing algorithm
CN114612835A (en) Unmanned aerial vehicle target detection model based on YOLOv5 network
CN106326810A (en) Road scene identification method and equipment
US20220004740A1 (en) Apparatus and Method For Three-Dimensional Object Recognition
WO2022206414A1 (en) Three-dimensional target detection method and apparatus
CN110633640A (en) Method for identifying complex scene by optimizing PointNet
CN112767405A (en) Three-dimensional mesh model segmentation method and system based on graph attention network
CN115032648A (en) Three-dimensional target identification and positioning method based on laser radar dense point cloud
CN114519819B (en) Remote sensing image target detection method based on global context awareness
CN110969641A (en) Image processing method and device
WO2024037562A1 (en) Three-dimensional reconstruction method and apparatus, and computer-readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23810727

Country of ref document: EP

Kind code of ref document: A1