WO2021147113A1 - Procédé d'identification de catégorie sémantique de plan et appareil de traitement de données d'image - Google Patents

Procédé d'identification de catégorie sémantique de plan et appareil de traitement de données d'image Download PDF

Info

Publication number
WO2021147113A1
WO2021147113A1 PCT/CN2020/074040 CN2020074040W WO2021147113A1 WO 2021147113 A1 WO2021147113 A1 WO 2021147113A1 CN 2020074040 W CN2020074040 W CN 2020074040W WO 2021147113 A1 WO2021147113 A1 WO 2021147113A1
Authority
WO
WIPO (PCT)
Prior art keywords
plane
semantic
image data
category
categories
Prior art date
Application number
PCT/CN2020/074040
Other languages
English (en)
Chinese (zh)
Inventor
马超群
陈平
方晓鑫
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2020/074040 priority Critical patent/WO2021147113A1/fr
Priority to CN202080001308.1A priority patent/CN113439275A/zh
Publication of WO2021147113A1 publication Critical patent/WO2021147113A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the embodiments of the present application relate to the field of image processing technology, and in particular, to a method for recognizing planar semantic categories and an image data processing device.
  • Augmented reality is a technology that calculates the position and angle of the camera image in real time and adds corresponding images, videos, and 3D models.
  • the goal of this technology is to put the virtual world in the real world on the screen And interact.
  • plane detection as a more important function in augmented reality, provides the perception of the basic three-dimensional environment of the real world, so that developers can place virtual objects according to the detected plane to achieve the effect of augmented reality.
  • Three-dimensional space plane detection is an important and basic function, because after the plane is detected, the anchor point of the object can be further determined, so as to render the object at the determined anchor point, so it is an important function in augmented reality , Provides the perception of the basic three-dimensional environment of the real world, enabling developers to place virtual objects according to the detected plane to achieve the effect of augmented reality.
  • the plane detected by most augmented reality algorithms only provides location information and cannot identify the plane category of the plane.
  • the plane category of the recognition plane can help developers improve the authenticity and interest of augmented reality applications.
  • the embodiments of the present application provide a method for recognizing planar semantic categories and an image data processing device to improve the accuracy of planar semantic category recognition.
  • an embodiment of the present application provides a method for recognizing planar semantic categories, including: an image data processing device obtains image data to be processed including N pixels, where N is a positive integer.
  • the image data processing device determines a semantic segmentation result of the image data to be processed, where the semantic segmentation result includes the target plane category corresponding to at least some of the N pixels.
  • the image data processing device obtains a first dense semantic map according to the semantic segmentation result, the first dense semantic map including at least one target plane category corresponding to at least one first three-dimensional point in the first three-dimensional point cloud, and the at least one first three-dimensional point cloud A three-dimensional point corresponds to at least one pixel point in the at least part of the pixel points.
  • the image data processing device performs plane semantic category recognition according to the first dense semantic map, and obtains the plane semantic category of one or more planes included in the image data to be processed.
  • the embodiment of the present application provides a method for recognizing planar semantic categories.
  • the method obtains the result of semantic segmentation of image data to be processed. Since the result of semantic segmentation includes the target to which each pixel of the N pixels included in the image data to be processed belongs For plane categories, subsequent semantic segmentation can improve the accuracy of plane semantic recognition.
  • the image data processing device obtains the first dense semantic map according to the semantic segmentation result, and then, uses the first dense semantic map to recognize the planar semantic category to obtain the planar semantic category of the image data to be processed It can enhance the accuracy of plane semantic recognition.
  • the image data processing device obtains the first dense semantic map according to the semantic segmentation result, including: the image data processing device obtains the second dense semantic map according to the semantic segmentation result and the depth image corresponding to the image data to be processed Semantic map.
  • the image data processing device uses the second dense semantic map as the first dense semantic map.
  • the image data processing device obtains the first dense semantic map according to the semantic segmentation result, including: the image data processing device obtains the second dense semantic map according to the semantic segmentation result.
  • the image data processing device uses one or more second three-dimensional points in the second three-dimensional point cloud in the second dense semantic map to update the historical dense semantic map to obtain the first dense semantic map.
  • the image data processing apparatus judging whether the current state of the image data processing apparatus is a motion state includes: the image data processing apparatus obtains second image data, which is different from the image data to be processed.
  • the image data processing device determines whether the state of the image data processing device is a motion state according to the pose of the first device corresponding to the image data to be processed and the pose of the second device corresponding to the second image data.
  • the second image data is adjacent to the image data to be processed and is located in the previous frame of the image data to be processed.
  • the image data processing apparatus determining that the current state is the motion state includes: when the difference between the pose of the first device and the pose of the second device is less than or equal to a first threshold, determining that the current state is Motion state
  • the image data processing device determining that the current state is a motion state includes: the image data processing device acquires second image data shot by the camera; wherein the second image data is adjacent to the image data to be processed and is located The last frame of the image data to be processed; the image data processing device according to the first device pose corresponding to the image data to be processed and the second device pose corresponding to the second image data, and between the second image data and the image data to be processed The difference between frames to determine the state of the image data processing device is a motion state.
  • the image data processing apparatus according to the pose of the first device corresponding to the image data to be processed and the pose of the second device corresponding to the second image data, and the relationship between the second image data and the image data to be processed
  • the difference between frames to determine the state of the image data processing device as a motion state includes: the difference between the first device pose corresponding to the image data to be processed and the second device pose corresponding to the second image data is less than or equal to In the case where the first threshold value and the frame difference between the second image data and the image data to be processed are greater than the second threshold value, the state of the image data processing device is a motion state.
  • the method provided in the embodiment of the present application further includes: the image data processing apparatus according to the image data to be processed and the image data to be processed For the depth information included in the depth image corresponding to the image data, an optimization operation is performed on the semantic segmentation result, and the optimization operation is used to correct noise and error parts in the semantic segmentation result. This can make subsequent semantic recognition more accurate.
  • the image data processing apparatus determining the semantic segmentation result of the image data to be processed includes: the image data processing apparatus determining each plane in one or more plane categories corresponding to any one of at least some of the pixels The probability of the category.
  • the image data processing device uses the plane category with the highest probability among the one or more plane categories corresponding to any one pixel as the target plane category corresponding to any one pixel to obtain the semantic segmentation result of the image data to be processed . That is, the probability of the target plane category corresponding to any pixel is the largest among the probabilities of one or more plane categories corresponding to any pixel. This can provide the accuracy of semantic recognition.
  • the image data processing device determines the probability of each plane category in one or more plane categories corresponding to any one of at least some of the pixels, including: the image data processing device processes according to the neural network Semantic segmentation of the image data is performed to obtain the probability of each plane category in one or more plane categories corresponding to any one of at least some pixels.
  • the image data processing device recognizes the plane semantic category according to the first dense semantic map, and obtains the plane semantic category of one or more planes included in the image data to be processed, including: the image data processing device recognizes the plane semantic category according to the to-be-processed image data. Process the image data to determine the plane equation for each of one or more planes.
  • the image data processing device performs the following steps on any one of the one or more planes to obtain the plane semantic category of the one or more planes: the image data processing device according to the plane equation of any one of the planes, and the first dense Semantic map, determining one or more target plane categories corresponding to any one of the planes and the confidence of the one or more target plane categories; selecting the target plane category with the highest confidence from the one or more target plane categories As the semantic plane category of any one of the planes. That is, the semantic plane category of any plane is the target plane category with the highest confidence among the one or more target plane categories corresponding to the any plane, and the target plane category with the highest confidence is selected as the semantic plane category of any plane. It can enhance the accuracy of plane semantic recognition.
  • the orientation of one or more target plane categories corresponding to any plane is consistent with the orientation of any plane. That is, the orientation of one or more target plane categories corresponding to each plane is consistent with the respective orientation of each plane. In this way, the target plane categories that are inconsistent with the plane orientation can be filtered out, and the accuracy of plane semantic recognition can be enhanced.
  • the image data processing device determines one or more target plane categories corresponding to any one plane and the first dense semantic map according to the plane equation of any one plane and the first dense semantic map.
  • the confidence of one or more target plane categories includes: the image data processing device determines M first three-dimensional points from the first dense semantic map according to the plane equation of any one of the planes, and the M first three-dimensional points The distance between the point and any one of the planes is less than a third threshold, and M is a positive integer; the one or more target plane categories corresponding to the M first three-dimensional points are determined as the one corresponding to the any one of the planes One or more target plane categories, the orientation of the one or more target plane categories is consistent with the orientation of any one of the planes, and the number of three-dimensional points corresponding to each target plane category in the one or more target plane categories is counted The ratio among the M first three-dimensional points obtains the confidence of the one or more target plane categories.
  • the target plane category corresponding to each first three-dimensional points obtains the confidence of
  • the image data processing device counts the proportion of the number of three-dimensional points corresponding to each target plane category in the one or more target plane categories among the M first three-dimensional points to obtain
  • the method provided in the embodiment of the present application further includes: the image data processing device updates one or more corresponding to any one of the planes according to at least one of Bayes' theorem or voting mechanism The confidence of the target plane category.
  • the video sequence based on Bayes' theorem and voting mechanism updates the confidence of one or more plane categories corresponding to any plane, so that the final result of the plane semantic category of each plane is more stable.
  • the method provided in the embodiment of the present application further includes: the image data processing device determines whether the current state of the image data processing device is a motion state, and when the current state is a motion state, the image data processing device According to the semantic segmentation result, the first dense semantic map is obtained. By judging whether it is in a motion state, in the motion state, according to the semantic segmentation result, the first dense semantic map can be obtained, which can reduce the amount of data calculated by the image data processing device, thereby reducing computing resources and improving the semantic map generation algorithm. performance.
  • the image data to be processed is image data after correction.
  • the method provided in the embodiment of the present application further includes: the image data processing apparatus obtains the first image data taken by the camera.
  • the image data processing device corrects the first image data according to the device pose corresponding to the first image data to obtain the image data to be processed.
  • an embodiment of the present application provides an image data processing device.
  • the image data processing device includes a semantic segmentation module, a semantic map module, and a semantic clustering module.
  • the semantic segmentation module is used to obtain information provided by the camera including N Image data to be processed of pixels, N is a positive integer.
  • the semantic segmentation module is also used to determine the semantic segmentation result of the image data to be processed, where the semantic segmentation result includes the target plane category corresponding to at least some of the N pixels.
  • the semantic map module is configured to obtain a first dense semantic map according to the semantic segmentation result, the first dense semantic map including at least one target plane category corresponding to at least one first three-dimensional point in the first three-dimensional point cloud, the at least One first three-dimensional point corresponds to at least one pixel point in the at least part of the pixel points.
  • the semantic clustering module is used to recognize the plane semantic category according to the first dense semantic map to obtain the plane semantic category of one or more planes included in the image data to be processed.
  • the embodiment of the present application provides an image data processing device, which obtains the semantic segmentation result of the image data to be processed. Because the semantic segmentation result includes the N pixels included in the image data to be processed, each pixel belongs to The target plane category can improve the accuracy of plane semantic recognition through semantic segmentation.
  • the image data processing device obtains the first dense semantic map according to the semantic segmentation result, and then, uses the first dense semantic map to recognize the planar semantic category to obtain the planar semantic category of the image data to be processed It can enhance the accuracy of plane semantic recognition.
  • the semantic map module is used to obtain the first dense semantic map according to the semantic segmentation result, including: the semantic map module is used to obtain the second dense semantic map according to the semantic segmentation result.
  • the semantic map module is used to use the second dense semantic map as the first dense semantic map.
  • the semantic map module is used to obtain the first dense semantic map according to the semantic segmentation result, including: the semantic map module is used to obtain the second dense semantic map according to the semantic segmentation result.
  • the semantic map module is used to update the historical dense semantic map by using one or more second three-dimensional points in the second three-dimensional point cloud in the second dense semantic map to obtain the first dense semantic map.
  • the image data processing device further includes: an instant localization and map construction (simultaneous localization and mapping, SLAM) module, which is used to calculate the device pose (such as the camera pose) of the image data, and the semantic map
  • the module is used to determine whether the current state of the image data processing device is a motion state, and includes: a semantic map module is used to obtain second image data provided by the camera, and the second image data is different from the image data to be processed.
  • the semantic map module is used to determine whether the state of the image data processing device is a motion state according to the first device pose corresponding to the image data to be processed provided by the SLAM module and the second device pose corresponding to the second image data provided by the SLAM module.
  • the second image data is adjacent to the image data to be processed and is located in the previous frame of the image data to be processed.
  • the semantic map module used to determine that the current state is a motion state includes: when the difference between the pose of the first device and the pose of the second device is less than or equal to the first threshold, the semantic map module Used to determine that the current state is an exercise state;
  • the semantic map module used to determine that the current state is a motion state includes: the semantic map module is used to obtain second image data taken by the camera; wherein the second image data is adjacent to the image data to be processed, And is located in the previous frame of the image data to be processed; the semantic map module is used for the first device pose corresponding to the image data to be processed provided by the SLAM module and the second device pose corresponding to the second image data provided by the SLAM module, and The frame difference between the second image data and the image data to be processed determines that the current state of the image data processing device is a motion state.
  • the semantic map module is used to determine the first device pose corresponding to the image data to be processed and the second device pose corresponding to the second image data, and the difference between the second image data and the image data to be processed.
  • Determine the current state of the image data processing device as a motion state including: the difference between the first device pose corresponding to the image data to be processed and the second device pose corresponding to the second image data is less than
  • the semantic map module is used to determine that the current state of the image data processing device is a motion state.
  • the semantic segmentation module is further configured to perform an optimization operation on the semantic segmentation result according to the image data to be processed and the depth information included in the depth image corresponding to the image data to be processed, and the optimization operation is used to modify the image data to be processed. Describe the noise and error in the semantic segmentation results.
  • the semantic segmentation module is used to determine the semantic segmentation result of the image data to be processed, including the method used to determine one or more plane categories corresponding to any one of the at least partial pixels and the The probability of each plane category in one or more plane categories, and the plane category with the highest probability among the one or more plane categories corresponding to any one pixel is used as the target plane category corresponding to any one pixel , In order to obtain the semantic segmentation result of the image data to be processed. That is, the probability of the target plane category corresponding to any one of at least some pixels included in the semantic segmentation result of the image data to be processed is the largest among the probabilities of one or more plane categories corresponding to any one pixel.
  • the semantic segmentation module is used to perform semantic segmentation on the image data to be processed according to the neural network to obtain at least part of the pixel points corresponding to any one of the one or more plane categories of each plane category Probability.
  • the semantic clustering module is used to recognize the plane semantic category according to the first dense semantic map to obtain the plane semantic category of one or more planes included in the image data to be processed, including: a semantic clustering module , Used to determine the plane equation of each of one or more planes according to the image data to be processed.
  • the semantic clustering module is also used to perform the following steps on any one of the one or more planes to obtain the plane semantic category of the one or more planes: the semantic clustering module is used to perform the following steps according to any one of the planes.
  • the semantic clustering module is used in all Among the one or more target plane categories, the target plane category with the highest confidence is selected as the semantic plane category of any one of the planes.
  • the orientation of one or more target plane categories corresponding to each plane is consistent with the respective orientation of each plane. That is, the orientation of one or more target plane categories corresponding to any plane is consistent with the orientation of the any plane.
  • the semantic clustering module is used to determine one or more target plane categories corresponding to any plane according to the plane equation of any plane and the first dense semantic map, and
  • the confidence of the one or more target plane categories includes: the semantic clustering module is configured to determine M first three-dimensional points from the first dense semantic map according to the plane equation of any one of the planes, and the M The distance between the first three-dimensional points and any one of the planes is less than a third threshold, and the orientation of the target plane category corresponding to the M first three-dimensional points is consistent with the orientation of any one of the planes, and M is a positive integer ,
  • the M first three-dimensional points correspond to the one or more plane categories; and counting the number of three-dimensional points corresponding to each plane category in the one or more plane categories among the M first three-dimensional points Ratio to obtain the confidence of the one or more plane categories.
  • the semantic clustering module is used to count the proportion of the number of three-dimensional points corresponding to each plane category in the one or more plane categories among the M first three-dimensional points to obtain the After the confidence of one or more plane categories, the semantic clustering module is further used to update the confidence of one or more target plane categories corresponding to any one of the planes according to at least one of Bayes' theorem or voting mechanism.
  • the semantic map module is used to determine whether the current state of the image data processing device is a motion state. When it is determined that the current state is the motion state, the semantic map module is used to obtain the first dense semantic map according to the semantic segmentation result.
  • the image data to be processed is image data after correction.
  • the semantic segmentation module before the semantic segmentation module is used to obtain the image data to be processed, the semantic segmentation module is also used to obtain the first image data taken by the camera.
  • the semantic segmentation module is used to correct the first image data according to the device pose corresponding to the first image data provided by the SLAM module to obtain the image data to be processed.
  • the SLAM module, the semantic clustering module, and the semantic map module run on the central processing unit CPU, and the semantic segmentation part of the semantic segmentation module can run on the NPU.
  • the semantic segmentation module excludes semantics. The other part of the function of the division runs on the central processing unit (CPU).
  • embodiments of the present application provide a computer-readable storage medium that stores instructions in the readable storage medium, and when the instructions are executed, the method described in any aspect of the first aspect is implemented.
  • an embodiment of the present application provides an image data processing device, including: a first processor and a second processor, where the first processor is configured to obtain image data to be processed including N pixels, N is a positive integer.
  • the second processor is configured to determine the semantic segmentation result of the image data to be processed, wherein the semantic segmentation result includes the target plane category corresponding to at least some of the N pixels;
  • the first processor is configured to segment the image data according to the semantic
  • a first dense semantic map is obtained, the first dense semantic map includes at least one target plane category corresponding to at least one first three-dimensional point in the first three-dimensional point cloud, and the at least one first three-dimensional point corresponds to the at least part of the At least one pixel in the pixel;
  • the first processor is configured to recognize the plane semantic category according to the first dense semantic map to obtain the plane semantic category of one or more planes included in the image data to be processed.
  • the first processor is specifically configured to obtain a second dense semantic map according to the semantic segmentation result and the depth image corresponding to the image data to be processed; the first processor specifically uses The second dense semantic map is used as the first dense semantic map, or, the first processor is specifically configured to use one or more second three-dimensional point clouds in the second dense semantic map.
  • the historical dense semantic map is updated with two and three-dimensional points to obtain the first dense semantic map.
  • the second processor is used to determine the semantic segmentation result of the image data to be processed, including depth information included in the image data to be processed and the depth image corresponding to the image data to be processed , Performing an optimization operation on the semantic segmentation result, and the optimization operation is used to correct noise and error parts in the semantic segmentation result.
  • the second processor before the second processor is configured to determine the semantic segmentation result of the to-be-processed image data, the second processor is also configured to determine the pixel corresponding to any one of the at least some pixels.
  • the probability of each plane category in one or more plane categories; and the plane category with the highest probability among the one or more plane categories corresponding to any one pixel is used as the target plane category corresponding to any one pixel , In order to obtain the semantic segmentation result of the image data to be processed. That is, the probability of the target plane category corresponding to any pixel is the largest among the probabilities of one or more plane categories corresponding to any pixel. This can provide the accuracy of semantic recognition.
  • the second processor is configured to perform semantic segmentation on the to-be-processed image data according to the neural network to obtain one or more plane categories corresponding to any one of the at least some pixels The probability of each plane category in.
  • the first processor is used to determine the plane equation of each of the one or more planes; the first processor is also used to determine the plane equation of any one of the one or more planes.
  • a plane executes the following steps to obtain the plane semantic category of the one or more planes: the first processor is further configured to determine the plane semantic category according to the plane equation of any plane and the first dense semantic map One or more target plane categories corresponding to any one plane and the confidence of the one or more target plane categories; the first processor is further configured to select the target with the highest confidence among the one or more target plane categories
  • the plane category is used as the semantic plane category of any one of the planes. That is, the semantic plane category of any plane is the highest-confidence target plane category among the one or more target plane categories corresponding to the any plane.
  • the orientation of one or more target plane categories corresponding to any plane is consistent with the orientation of the any plane.
  • the first processor is specifically configured to determine M first three-dimensional points from the first dense semantic map according to the plane equation of any one of the planes, and the M first three-dimensional points are The distance between the three-dimensional point and any one of the planes is less than a third threshold, and M is a positive integer; the one or more target plane categories corresponding to the M first three-dimensional points are determined as the all corresponding to the any one of the planes.
  • the one or more target plane categories, the orientation of the one or more target plane categories is consistent with the orientation of any one of the planes, and the number of three-dimensional points corresponding to each target plane category in the one or more target plane categories is counted According to the ratio of the target in the M first three-dimensional points, the confidence of the one or more target plane categories is obtained.
  • the first processor is specifically configured to count the proportion of the number of three-dimensional points corresponding to each target plane category in the one or more target plane categories among the M first three-dimensional points After obtaining the confidence of the one or more target plane categories, the first processor is further configured to update one or more targets corresponding to any one of the planes according to at least one of Bayes' theorem or a voting mechanism The confidence of the plane category.
  • the first processor is configured to determine whether the current state is the motion state; and when it is determined that the current state is the motion state, obtain the first dense semantic map according to the semantic segmentation result.
  • the first processor may be a CPU or a DSP.
  • the second processor may be an NPU.
  • an embodiment of the present application provides an image data processing device, including: one or more processors, wherein the one or more processors are configured to execute instructions stored in a memory to execute instructions as described in any aspect of the first aspect Methods.
  • a computer program product including instructions.
  • the computer program product includes instructions. When the instructions are executed, the method as described in any aspect of the first aspect is implemented.
  • FIG. 1 is a schematic diagram of the hardware structure of an electronic device provided by an embodiment of the application.
  • FIG. 2 is a schematic diagram of a software architecture applicable to a method for identifying planar semantic categories provided by an embodiment of the application;
  • FIG. 3 is a schematic flowchart of a method for recognizing planar semantic categories according to an embodiment of this application
  • FIG. 4 is a schematic flowchart of another method for recognizing planar semantic categories according to an embodiment of this application.
  • FIG. 5 is a schematic diagram of the first image data before and after processing obtained by the image data processing device provided by the embodiment of the application;
  • FIG. 6 is a schematic diagram of a semantic segmentation result provided by an embodiment of this application.
  • FIG. 7 is a schematic diagram of a coordinate mapping provided by an embodiment of this application.
  • FIG. 8 is a schematic diagram of a flow state determination process provided by an embodiment of the application.
  • FIG. 9 is a calculation flow of plane confidence provided by an embodiment of the application.
  • FIG. 10 is a schematic flow chart of performing filtering on semantic segmentation results according to an embodiment of this application.
  • FIG. 11 is a schematic diagram of another process of performing filtering on semantic segmentation results according to an embodiment of the application.
  • FIG. 12 is a schematic diagram of a planar semantic result provided by an embodiment of this application.
  • FIG. 13 is a schematic structural diagram of an image data processing device provided by an embodiment of the application.
  • At least one item (a) refers to any combination of these items, including any combination of a single item (a) or a plurality of items (a).
  • at least one of a, b, or c can mean: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple .
  • the method for identifying planar semantic categories provided in the embodiments of the present application can be applied to various image data processing apparatuses with TOF, and the image data processing apparatus may be an electronic device.
  • electronic devices may include, but are not limited to, personal computers, server computers, handheld or laptop devices, mobile devices (such as mobile phones, mobile phones, tablet computers, personal digital assistants, media players, etc.), consumer electronic devices, Small computers, large computers, mobile robots, drones, etc.
  • the electronic device in the embodiment of the present application may be a device with AR function, for example, a device with AR glasses function, which can be applied to scenarios such as AR automatic measurement, AR decoration, and AR interaction.
  • the image data processing device can use the plane semantic category provided in this embodiment of the application.
  • the recognition method to obtain the plane category recognition result of the image data to be processed.
  • the image data processing device may send the image data to be processed to other devices that have the realization of the recognition process of the flat semantic category, such as a server or a terminal device, and the server or the terminal device performs the recognition of the flat semantic category. Process, and then the image data processing device receives plane category recognition results from other equipment.
  • the electronic device 100 may include a display device 110, a processor 120 and a memory 130.
  • the memory 130 may be used to store software programs and data, and the processor 120 may execute various functional applications and data processing of the electronic device 100 by running the software programs and data stored in the memory 130.
  • the memory 130 may mainly include a storage program area and a storage data area.
  • the storage program area may store an operating system, an application program (such as an image capture function, etc.) required by at least one function; the storage data area may store information according to the electronic device 100 Use the created data (such as audio data, text information, image data), etc.
  • the memory 130 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the processor 120 is the control center of the electronic device 100, which uses various interfaces and lines to connect various parts of the entire electronic device, and executes various functions of the electronic device 100 by running or executing software programs and/or data stored in the memory 130 And process data to monitor the electronic equipment as a whole.
  • the processor 120 may include one or more processing units.
  • the processor 120 may include a central processing unit (CPU), an application processor (AP), a modem processor, and a graphics processor ( graphics processing unit (GPU), image signal processor (ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network Processor (Neural-network Processing Unit, NPU), etc.
  • the different processing units may be independent devices or integrated in one or more processors.
  • the NPU is used as a neural-network (NN) computing processor.
  • NN neural-network
  • the NPU By learning from the structure of biological neural networks, such as the transfer mode between human brain neurons, it can quickly process input information, and it can also continuously self-learn.
  • applications such as intelligent cognition of the electronic device 100 can be realized, such as image recognition, face recognition, voice recognition, text understanding, and so on.
  • the processor 120 may include one or more interfaces.
  • Interfaces can include integrated circuit (I2C) interfaces, integrated circuits built-in audio (inter-integrated circuitsound, I2S) interfaces, pulse code modulation (PCM) interfaces, universal asynchronous receivers /transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and/or Universal serial bus (USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuits built-in audio
  • PCM pulse code modulation
  • UART universal asynchronous receivers /transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB Universal serial bus
  • the I2C interface is a bidirectional synchronous serial bus, which includes a serial data line (SDA) and a serial clock line (SCL).
  • the processor 120 may include multiple sets of I2C buses.
  • the processor 120 may be coupled to the touch sensor, charger, flashlight, image acquisition device 160, etc., respectively through different I2C bus interfaces.
  • the processor 120 may couple the touch sensor through an I2C interface, so that the processor 120 communicates with the touch sensor through the I2C bus interface, so as to realize the touch function of the electronic device 100.
  • the I2S interface can be used for audio communication.
  • the processor 120 may include multiple sets of I2S buses.
  • the processor 120 may be coupled with the audio module through an I2S bus to implement communication between the processor 120 and the audio module.
  • the audio module can transmit audio signals to the WiFi module 190 through the I2S interface, so as to realize the function of answering calls through the Bluetooth headset.
  • the PCM interface can also be used for audio communication to sample, quantize and encode analog signals.
  • the audio module and the WiFi module 190 may be coupled through a PCM bus interface.
  • the audio module may also transmit audio signals to the WiFi module 190 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • UART interface is a universal serial data bus used for asynchronous communication.
  • the bus can be a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • the UART interface is usually used to connect the processor 120 and the WiFi module 190.
  • the processor 120 communicates with the Bluetooth module in the WiFi module 190 through the UART interface to realize the Bluetooth function.
  • the audio module can transmit audio signals to the WiFi module 190 through the UART interface, so as to realize the function of playing music through the Bluetooth headset.
  • the MIPI interface can be used to connect the processor 120 with peripheral devices such as the display device 110 and the image acquisition device 160.
  • the MIPI interface includes an image acquisition device 160 serial interface (camera serial interface, CSI), a display serial interface (display serial interface, DSI), and so on.
  • the processor 120 and the image acquisition device 160 communicate through a CSI interface to implement the shooting function of the electronic device 100.
  • the processor 120 communicates with the display screen through the DSI interface to realize the display function of the electronic device 100.
  • the GPIO interface can be configured through software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface can be used to connect the processor 120 with the image capture device 160, the display device 110, the WiFi module 190, the audio module, the sensor module, and so on.
  • the GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface is an interface that complies with the USB standard specifications, and can be a Mini USB interface, a Micro USB interface, and a USB Type C interface.
  • the USB interface can be used to connect a charger to charge the electronic device 100, and can also be used to transfer data between the electronic device 100 and peripheral devices. It can also be used to connect earphones and play audio through earphones. This interface can also be used to connect other electronic devices, such as AR devices.
  • the interface connection relationship between the modules illustrated in the embodiment of the present invention is merely a schematic description, and does not constitute a structural limitation of the electronic device 100.
  • the electronic device 100 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.
  • the electronic device 100 also includes an image capture device 160 for capturing images or videos.
  • the image capturing device 160 includes one or more cameras for capturing image data and a TOF camera for capturing depth images.
  • a camera is used to collect video graphics sequence (Video Graphics Array, VGA) or image data and send it to the CPU and GPU.
  • the camera can be an ordinary camera or a focusing camera.
  • the electronic device 100 may also include an input device 140 for receiving inputted digital information, character information, or contact touch operations/non-contact gestures, and generating signal inputs related to user settings and function control of the electronic device 100.
  • the display device 110 includes a display panel 111 that is used to display information input by the user or information provided to the user, and various menu interfaces of the electronic device 100. In the embodiment of the present application, it is mainly used to display the camera in the electronic device 100 Or the image data to be processed obtained by the sensor.
  • the display panel may be configured with the display panel 111 in the form of a liquid crystal display (LCD) or an organic light-emitting diode (OLED).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • the electronic device 100 may also include one or more sensors 170, such as an image sensor, an infrared sensor, a laser sensor, a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, an ambient light sensor, Fingerprint sensors, touch sensors, temperature sensors, bone conduction sensors, inertial measurement units (IMU), etc., where the image sensors may be time of flight (TOF) sensors, structured light sensors, and the like.
  • the inertial measurement unit is a device that measures the three-axis attitude angle (or angular velocity) and acceleration of an object.
  • an IMU contains three single-axis accelerometers and three single-axis gyroscopes.
  • the accelerometer detects the acceleration signal of the object in the independent three-axis of the carrier coordinate system.
  • the gyroscope detects the angular velocity signal of the carrier relative to the navigation coordinate system, measures the angular velocity and acceleration of the object in three-dimensional space, and calculates the posture of the object.
  • the image sensor may be a device in the image acquisition device 160 or an independent device for acquiring image data.
  • the electronic device 100 may also include a power supply 150 for supplying power to other modules.
  • the electronic device 100 may also include a radio frequency (RF) circuit 180 for network communication with wireless network devices, and may also include a WiFi module 190 for WiFi communication with other devices, for example, for acquiring other devices The transmitted image or data, etc.
  • RF radio frequency
  • the electronic device 100 may also include other possible functional modules such as a flashlight, a Bluetooth module, an external interface, a button, a motor, etc., which will not be described here.
  • FIG. 2 shows a software architecture applicable to a planar semantic category recognition method provided by an embodiment of the present application.
  • the software architecture is applied to the electronic device 100 shown in FIG. 1, and the architecture includes: semantic segmentation Module 202, semantic map module 203, and semantic clustering module 204.
  • the software architecture may also include an instant localization and mapping (simultaneous localization and mapping, SLAM) module 201.
  • SLAM instant localization and mapping
  • the SLAM module 201, the semantic map module 203, and the semantic clustering module 204 run on the CPU of the electronic device as described in FIG. 1.
  • part of the functions in the SLAM module 201 can be deployed on digital signal processing (Digital Signal Processing, DSP), and part of the functions in the semantic segmentation module 202 runs on the NPU of the electronic device as described in FIG.
  • DSP Digital Signal Processing
  • the functions of the segmentation module 202 other than the functions running on the NPU of the electronic device as described in FIG. 1 are running on the CPU. Refer to the subsequent descriptions for the specific functions that run on the NPU.
  • the SLAM module 201 is used to provide a video graphics sequence including one or more image data provided by a camera (also called a camera corresponding to the image acquisition device 160 of the electronic device described in FIG. 1), and the depth of the image data provided by the TOF (Depth) information or depth image, and IMU is used to provide IMU data, as well as the correlation between image data frames, combined with the principle of visual geometry, the device pose can be calculated (for example, if the device is a camera, the device position The pose can refer to the camera pose), that is, the rotation and translation of the camera relative to the first frame, and the plane is detected, and the device pose, normal parameters, and boundary points of the plane are output.
  • IMU data includes accelerometers and gyroscopes.
  • the depth information includes the distance between each pixel in the image data and the camera that captured the image data.
  • the semantic segmentation module 202 implements semantic segmentation data enhancement based on SLAM technology, and is divided into pre-processing, AI processing, and post-processing.
  • the input of the pre-processing is the original image data (for example, RGB image) provided by the camera and the device pose obtained by the SLAM module 201.
  • the original image data is corrected according to the device pose and output as the corrected image data.
  • the semantic segmentation model's constraints on rotation invariance can be reduced, and the recognition rate can be improved.
  • AI processing is based on neural network for semantic segmentation, running on the NPU, the input is the image data after the correction, and the output is the image data after the correction.
  • Each pixel included in the image data belongs to one or more plane categories. Probability distribution (that is, the probability that each pixel belongs to one or more plane categories). If the plane category with the highest probability is selected, the pixel-level semantic segmentation result can be obtained.
  • the aforementioned neural network may be a convolutional neural network (convolutional neural network, CNN), a deep neural network (deep neural network, DNN), and a recurrent neural network (recurrent neural network, RNN).
  • the input of post-processing is the original image data provided by the camera, depth information, and the semantic segmentation result of AI processing input.
  • the semantic segmentation result is filtered mainly according to the original image data and depth information, and the output is the optimized semantic segmentation result.
  • the accuracy and edges of the segmentation after post-processing are better.
  • post-processing is not a necessary technique in the embodiment and may not be executed.
  • the pre-processing and post-processing may run on the CPU or other processors instead of the NPU, which is not limited in this embodiment.
  • the input of the semantic map module 203 is the optimized semantic segmentation result or the unoptimized semantic segmentation result, the device pose provided by the SLAM module 201, the depth information provided by the TOF, and the original image data provided by the camera.
  • the semantic map module 203 mainly Dense Semantic Map is generated based on SLAM technology.
  • the optimized semantic segmentation result or the unoptimized semantic segmentation result, as well as the device pose provided by the SLAM module 201, the depth information provided by the TOF, and the original image data provided by the camera generate dense semantic ground.
  • the process includes converting the original two-dimensional image data into a three-dimensional dense semantic map.
  • the two-dimensional RGB pixels in the two-dimensional original image data are converted into three-dimensional points in the three-dimensional space, so that each pixel includes depth information in addition to the RGB information.
  • the process of this two-dimensional to three-dimensional conversion can refer to the description in the prior art, which will not be repeated here.
  • the target plane category of each pixel is used as the target plane category of the three-dimensional point corresponding to the pixel, so that the target plane category of multiple pixels is transformed into the target plane category of multiple three-dimensional points. Therefore, the dense semantic map includes multiple three-dimensional point target plane categories.
  • the target plane category of any three-dimensional point corresponds to the target plane category of the two-dimensional pixel point corresponding to the three-dimensional point.
  • the input of the semantic map module 203 is the optimized semantic segmentation result; when the post-processing is not performed in the embodiment, the input of the semantic map module 203 is the unoptimized semantic segmentation result .
  • the semantic clustering module 204 performs planar semantic recognition based on the dense semantic map.
  • this application provides a method for identifying plane semantic categories and an image data processing device, wherein the method can enable the image data processing device to detect more than one plane included in the image data.
  • the method and the image data processing device are based on the same inventive concept. Since the method and the computing device have similar principles for solving the problem, the implementation of the image data processing device and the method can be referred to each other, and the repetition will not be repeated.
  • Figure 3 shows a planar semantic category recognition method provided by an embodiment of the present application.
  • the method is applied to an image data processing device.
  • the method includes: step 301.
  • the semantic segmentation module 202 obtains the to-be-processed Image data.
  • the image data to be processed includes N pixels, and N is a positive integer.
  • the image data to be processed may be taken by the camera of the image data processing device and provided to the semantic segmentation module 202, or it may be obtained by the semantic segmentation module 202 from the image library used for storing image data in the image data processing device. It may also be sent by other devices, which is not limited in the embodiment of the present application.
  • the image data to be processed may be a two-dimensional image.
  • the image data to be processed may be a color photo or a black and white photo, which is not limited in the embodiment of the present application.
  • the N pixels may be all pixels in the image data to be processed, or may be some pixels in the image data to be processed.
  • the N pixels may be pixels belonging to the plane category in the image data to be processed, excluding the pixels of the non-planar category.
  • a pixel of a non-planar category refers to a pixel that does not belong to any recognized plane category, and at this time, the pixel is considered to not belong to a pixel on any plane.
  • Step 302 The semantic segmentation module 202 determines the semantic segmentation result of the image data to be processed.
  • the semantic segmentation result includes the target plane category corresponding to at least some of the N pixels included in the image data to be processed.
  • the at least part of the pixels may be pixels of one or more planes included in the image data to be processed.
  • the target plane category corresponding to at least some of the N pixels may refer to the target plane category corresponding to some of the N pixels, or may refer to the target plane category corresponding to all of the N pixels.
  • the image data processing device in the embodiment of the present application can independently determine the semantic segmentation result of the image data to be processed, and at this time, the image data processing device may have a module (for example, NPU) that determines the semantic segmentation result of the image data to be processed.
  • a module for example, NPU
  • the image data processing device in the embodiment of the present application can also send the image data to be processed to the device with the function of determining the semantic segmentation result of the image data to be processed, so that the device with the function of determining the semantic segmentation result of the image data to be processed The device determines the semantic segmentation result of the image data to be processed. Then, the image data processing apparatus obtains the semantic segmentation result of the image data to be processed from the device having the function of determining the semantic segmentation result of the image data to be processed. In the embodiment of the present application, the image data processing apparatus can detect one or more planes included in the image data to be processed by determining the semantic segmentation result of the image data to be processed.
  • Step 303 The semantic map module 203 obtains a first dense semantic map according to the semantic segmentation result.
  • the first dense semantic map includes at least one target plane category corresponding to at least one first three-dimensional point in the first three-dimensional point cloud.
  • One first three-dimensional point corresponds to at least one pixel point in the at least part of the pixel points.
  • the purpose of step 303 in the embodiment of this application is that the semantic map module 203 uses the plane category of each pixel in the two-dimensional space to update the plane category of the three-dimensional point corresponding to the pixel in the three-dimensional space, that is, as the three-dimensional point The target plane category.
  • the semantic map module 203 may use at least one target plane category corresponding to the three-dimensional point cloud corresponding to all pixels in the semantic segmentation result as the first dense semantic map. Through step 303, the performance of the semantic map generation algorithm can be improved.
  • Step 304: The semantic clustering module 204 performs plane semantic category recognition according to the first dense semantic map, and obtains the plane semantic category of one or more planes included in the image data to be processed.
  • the embodiment of the present application provides a method for recognizing planar semantic categories.
  • the method obtains the result of semantic segmentation of image data to be processed. Since the result of semantic segmentation includes the target to which each pixel of the N pixels included in the image data to be processed belongs For plane categories, subsequent semantic segmentation can improve the accuracy of plane semantic recognition.
  • the image data processing device obtains the first dense semantic map according to the semantic segmentation result, and then, uses the first dense semantic map to recognize the planar semantic category to obtain the planar semantic category of the image data to be processed It can enhance the accuracy and stability of planar semantic recognition.
  • step 303 in the embodiment of the present application can be implemented in the following manner: the semantic map module 203 determines whether the current state of the image data processing device is a motion state. When it is determined that the current state is a motion state, the first dense semantic map is obtained according to the semantic segmentation result. By judging whether it is in the motion state, the first dense semantic map is obtained according to the semantic segmentation result in the motion state, which can reduce the amount of calculation.
  • the image data processing device uses the historical dense semantic map as the first dense semantic map.
  • the image data to be processed in the embodiment of the present application is image data after correction.
  • the semantic segmentation module 202 corrects the image data to be processed or adopts the image data to be processed before performing semantic segmentation on the image data to be processed, which can reduce the constraint of the semantic segmentation model on rotation invariance and improve the recognition rate.
  • the method provided in the embodiment of the present application may further include before step 301: step 305 , The semantic segmentation module 202 obtains the first image data shot by the first device.
  • the image data processing apparatus may control the first device to capture the first image data, and send the captured first image data to the semantic segmentation module 202.
  • the first image data can also be obtained by the semantic segmentation module 202 from a memory pre-stored in the image data processing apparatus, or the semantic segmentation module 202 can obtain the first image taken by the first device from other devices (for example, SLR or DV). Image data.
  • the first device may be a camera built in the image data processing device, or a photographing device connected to the image data processing device.
  • step 301 can be implemented by the following step 3011: step 3011, the semantic segmentation module 202 corrects the first image data according to the first device pose of the first device corresponding to the first image data to obtain the image data to be processed.
  • each image data in the embodiment of the present application may correspond to a device pose.
  • the semantic segmentation module 202 may set the first image data according to the pose of the device corresponding to the shooting of the first image data.
  • the semantic segmentation module 202 can independently determine that the first image data is not set.
  • the image data processing apparatus receives an operation instruction for the first image data input by the user for indicating the first image data to be set, In this way, the image data processing device can determine that the first image data is not set, and then the image data processing device uses the semantic segmentation module 202 to set the first image data.
  • the pose of the device corresponding to the image data in the embodiment of the present application refers to the pose of the device that captured the image data when the image data was captured.
  • the same device may correspond to different device poses at different times. It can be understood that, if the first image data is image data that is to be corrected, the process of correcting the image to be processed can be omitted.
  • FIG. 5 shows the first image data obtained by the image data processing device.
  • the first image data is not set to be positive.
  • the image data processing device may correct the first image data according to the device pose of the device that took the first image data, and the image data after the correction is as shown in (b) of FIG. 5.
  • the method provided in this embodiment of the present application may be implemented in step 302 through the following steps 3021 and 3022:
  • Step 3021 the semantic segmentation module 202 determines one or more plane categories corresponding to any one pixel in at least some of the pixels and the probability of each plane category in the one or more plane categories.
  • step 3021 in the embodiment of the present application can be specifically implemented in the following manner: the semantic segmentation module 202 performs semantic segmentation on the image data to be processed according to the neural network, and obtains at least part of the pixel points corresponding to any one of the pixels.
  • One or more plane categories and the probability of each plane category in the one or more plane categories can be specifically referred to the prior art, which is not limited in this embodiment.
  • Step 3022 the semantic segmentation module 202 uses the plane category with the highest probability among the one or more plane categories corresponding to any one pixel as the target plane category corresponding to any one pixel to obtain the image data to be processed. Semantic segmentation results. Among them, the probability of the target plane category corresponding to any one pixel is the largest among the probabilities of one or more plane categories corresponding to any one pixel.
  • any pixel in the embodiment of the present application may correspond to one or more plane categories, and any pixel may correspond to the probability of belonging to each plane category of the one or more plane categories.
  • the sum of the probabilities of one or more plane categories corresponding to any pixel is equal to 1.
  • semantic segmentation processing may be performed on the image data to be processed. It is understandable that the purpose of semantic segmentation is to assign a category label to each pixel in the image data to be processed.
  • the image data to be processed is composed of many pixels, and semantic segmentation is to group the pixels according to the different semantic meanings expressed in the image. That is, the so-called semantic segmentation is to segment the image data to be processed into regions with different semantics, and mark the plane category to which each region belongs, such as cars, trees, or faces. Semantic segmentation combines the two technologies of segmentation and target recognition, and can segment the image into regions with advanced semantic content. For example, through semantic segmentation, an image data can be segmented into three different semantic regions of "cow”, "grass” and "sky".
  • Figure 6 (a) shows a to-be-processed image data provided by an embodiment of the present application
  • Figure 6 (b) shows The schematic diagram of the image data to be processed after semantic segmentation processing is shown. From (b) in Figure 6, it can be concluded that the image data to be processed is divided into four different semantic regions, such as "ground”, “table”, wall”, and "chair”.
  • the semantic segmentation module 202 may use a semantic segmentation model to determine the probability of one or more plane categories to which each pixel of the N pixels belongs.
  • each pixel point may correspond to one or more plane categories, and the sum of the probabilities of all plane categories corresponding to each pixel point is equal to 1.
  • the probability of the target plane category corresponding to any one of the N pixels is the highest probability among the probabilities of one or more plane categories corresponding to the any one pixel.
  • the plane category of one or more planes included in the image data to be processed is ground, table, chair, wall, etc.
  • the image data processing device can obtain through step 302
  • the target plane category to which pixel 1 to pixel 4 belong is shown in Table 1:
  • the semantic segmentation model in the embodiment of the present application may use mobileNet v2 as the coding network, or may be implemented by MaskRCNN. It should be understood that in the embodiments of the present application, any other model that can perform semantic segmentation may also be used to obtain the semantic segmentation result.
  • the embodiment of the present application uses mobileNet v2 as an encoding network for semantic segmentation as an example for description, but this does not cause a restriction on the semantic segmentation method, and will not be repeated in the following.
  • the mobileNet v2 model has the advantages of small size, fast speed, and high accuracy, which meets the requirements of mobile phone platforms and enables semantic segmentation to reach a frame rate of more than 5fps.
  • step 302 in the embodiment of the present application can be implemented in the following manner: the semantic segmentation module 202 according to one or more plane categories corresponding to each of at least some of the N pixels The probability of determining the semantic segmentation result of the image data to be processed. That is, the semantic segmentation module 202 determines the plane category with the highest probability in each pixel of at least some of the pixels as the respective target plane category of each pixel to obtain the semantic segmentation result of the image data to be processed.
  • the method provided in this embodiment of the present application after step 302 and before step 303 may further include: step 306, the semantic segmentation module 202 according to The image data to be processed and the depth information included in the depth image corresponding to the image data to be processed perform an optimization operation on the semantic segmentation result, and the optimization operation is used to correct the noise in the semantic segmentation result and the error part caused by the segmentation process.
  • the target plane category of the pixel A in the semantic segmentation result is a table, but in fact the target plane category of the pixel A should be the ground , So the target plane category of pixel A can be changed from table to ground.
  • a certain pixel point B is not segmented, and the target plane category of the pixel point B can be determined by performing an optimization operation. For the specific algorithm implementation of the optimization operation, reference may be made to the prior art for details, and details are not described in this embodiment.
  • the depth information in the embodiment of the present application includes the distance between each pixel and the device that captures the image data to be processed.
  • the purpose of the optimization operation on the semantic segmentation result in the embodiment of the present application is to optimize and repair the semantic segmentation result.
  • the depth information can be used to filter and modify the semantic segmentation results, avoiding wrong segmentation and unsegmentation in the semantic segmentation results.
  • FIG. 10 and FIG. 11 For the detailed process of optimizing the semantic segmentation result, please refer to the description of FIG. 10 and FIG. 11 below, which will not be repeated here.
  • the semantic map module 303 in this embodiment of the application determines whether the current state of the image data processing device is in a motion state (that is, step 303), which can be implemented in the following manner: the semantic map module 303 obtains the second image taken by the camera. Image data. The semantic map module 303 is based on the difference between the pose of the first device corresponding to the image data to be processed and the pose of the second device corresponding to the second image data, and the frame difference between the second image data and the image data to be processed Determine whether the current state of the image data processing device is a motion state.
  • the difference between the pose of the first device corresponding to the image data to be processed and the pose of the second device corresponding to the second image data is less than or equal to the first threshold, and the second image data
  • the semantic map module 303 determines that the current state of the image data processing device is a motion state.
  • the second image data is adjacent to the image data to be processed and is located in the previous frame of the image data to be processed. Refer to Figure 8 for the specific process.
  • the image data processing device determines that the current state of the image data processing device is a static state.
  • the image data processing device can directly use the historical dense semantic map as the first dense semantic map, and perform subsequent processing.
  • the historical dense semantic map in the embodiment of the present application may be stored in the image data processing device, or of course, may also be obtained from other equipment by the image data processing device, which is not limited in the embodiment of the present application.
  • the historical dense semantic map is the semantic image result generated and saved in history. After each frame of new image data arrives, the historical dense semantic map will be updated.
  • the historical dense semantic map is a dense semantic map corresponding to the previous frame of the dense semantic map corresponding to one frame of image data, or a synthesis of the dense semantic maps corresponding to the previous few frames of images.
  • step 304 of the embodiment of the present application can be implemented in the following manner: the semantic map module 303 obtains a second dense semantic map according to the semantic segmentation result and the depth image corresponding to the image data to be processed.
  • the semantic map module 303 directly uses the second dense semantic map as the first dense semantic map. That is, every time the second dense semantic map is calculated, the second dense semantic map is directly used as the subsequent calculation.
  • the depth image corresponding to the image data to be processed in the embodiment of the present application refers to an image that has the same size as the image data to be processed and whose element value is the depth value of the scene point corresponding to the image point in the image data to be processed.
  • the image data to be processed is acquired by the image acquisition device shown in FIG. 2, and the depth image corresponding to the processed image data is acquired by the TOF shown in the figure.
  • methods such as TOF camera, structured light, laser scanning, etc. may be used to obtain depth information, thereby obtaining a depth image.
  • any other method (or camera) for obtaining a depth image may also be used to obtain a depth image.
  • the TOF camera to obtain the depth image is used as an example for description, but this does not cause a restriction on the way of obtaining the depth image, and will not be repeated in the following.
  • the point cloud is a three-dimensional concept and the pixels in the depth image are a two-dimensional concept
  • the image coordinates of the point can be converted into three-dimensional The world coordinates in the space, so the point cloud in the three-dimensional space can be recovered according to the depth image.
  • the principle of visual geometry can be used to convert image coordinates into world coordinates. According to the principle of visual geometry, the process of mapping a three-dimensional point M (Xw, Yw, Zw) in a world coordinate system to a point m (u, v) on the image is shown in Figure 7.
  • the Xc axis of the dashed line in Figure 7 is based on The Xc axis of the solid line is obtained after translation, and the Yc axis of the dashed line is obtained after the Yc axis is translated based on the solid line.
  • u, v are arbitrary coordinate points in the image coordinate system.
  • f is the focal length of the camera
  • dx and dy are the pixel sizes in the x and y directions, respectively
  • u 0 and v 0 are the center coordinates of the image, respectively.
  • Xw, Yw, Zw represent the three-dimensional coordinate points in the world coordinate system.
  • Zc represents the Z-axis value of the camera coordinates, that is, the distance from the target to the camera.
  • R and T are the 3x3 rotation matrix and 3x1 translation matrix of the external parameter matrix, respectively.
  • the depth map can be restored to a point cloud based on the camera coordinate system, that is, the rotation matrix R takes the identity matrix, and the translation vector T is 0, and we can get:
  • Xc, Yc, Zc are the three-dimensional point coordinates in the camera coordinate system.
  • Zc represents the value on the depth map.
  • the current depth unit obtained by TOF is millimeter (mm), so that the coordinates of the three-dimensional point in the camera coordinate system can be calculated, and then the device pose R and T calculated by the SLAM module can be calculated.
  • the point cloud data converted to the world coordinate system can be obtained. Specifically, as shown in the following formula:
  • the three-dimensional points in this embodiment are three-dimensional pixels, that is, the two-dimensional pixels involved in steps 301 and 302 are converted into three-dimensional pixels.
  • step 304 of the embodiment of the present application can be implemented in the following manner: the semantic map module 303 obtains the second dense semantic map according to the semantic segmentation result and the depth image corresponding to the image data to be processed.
  • the multiple three-dimensional points included in the second dense semantic map obtained by combining multiple two-dimensional pixel points and depth images can refer to the prior art.
  • the semantic map module 303 uses one or more second three-dimensional points in the second three-dimensional point cloud in the second dense semantic map to update the historical dense semantic map to obtain the first dense semantic map. Different from directly using the second dense semantic map as the first dense semantic map, a part of all the three-dimensional points in the second three-dimensional point cloud can be used for the update.
  • the update may not be for all three-dimensional points in the second dense semantic map, but only replace the target plane category of the corresponding three-dimensional point of the historical dense semantic map with the probability of the target plane category of some three-dimensional points in the second dense semantic map. Probability. Therefore, the update may be an update to a part of the dense semantic map, instead of directly using the second dense semantic map as the first dense semantic map.
  • the semantic map module 303 uses one or more second three-dimensional points in the second three-dimensional point cloud in the second dense semantic map to update the three-dimensional points corresponding to the one or more second three-dimensional points in the historical dense semantic map.
  • the probability or the probability of the target plane category of the three-dimensional point, obtains the first dense semantic map.
  • the semantic map module 303 uses one or more second three-dimensional points in the second three-dimensional point cloud in the second dense semantic map to update the three-dimensional points corresponding to the one or more second three-dimensional points in the historical dense semantic map.
  • the probability is the probability of replacing the target plane category of the three-dimensional point A in the historical dense semantic map with the probability of the three-dimensional point A being in the target plane category of the second dense semantic map.
  • step 304 in the embodiment of the present application can be specifically implemented in the following manner: step 3041, the semantic clustering module 204 determines the plane equation of each of the one or more planes .
  • the semantic clustering module 204 performs plane fitting on the three-dimensional point cloud data of each pixel to obtain a plane equation.
  • the semantic clustering module 204 can use the RANSAC method or the SVD equation solving method to perform plane fitting on the three-dimensional point cloud data of each pixel to obtain a plane equation.
  • the image data processing device in the embodiment of the present application can determine the respective area of each plane and the orientation of each plane.
  • the normal vector is used to indicate the orientation of the plane.
  • the orientation of the plane in the embodiment of the present application can also be replaced by the orientation of the plane in expression.
  • the semantic clustering module 204 performs the following steps 3042 and 3043 on any one of the one or more planes to obtain the plane semantic category of one or more planes: Step 3042, the semantic clustering module 204 according to the plane semantic category of any one of the planes.
  • the plane equation and the first dense semantic map determine one or more target plane categories corresponding to any one of the planes and the confidence of the one or more target plane categories.
  • step 3042 in the embodiment of the present application can be implemented in the following manner: the semantic clustering module 204 determines M from the first dense semantic map according to the plane equation of any one of the planes. For the first three-dimensional point, the distance between the M first three-dimensional points and any one of the planes is less than a third threshold, and M is a positive integer; the semantic clustering module 204 assigns one of the M first three-dimensional points to one
  • the one or more target plane categories are determined as the one or more target plane categories corresponding to the any one plane, the orientation of the one or more target plane categories is consistent with the orientation of the any one plane, and the statistics of the one
  • the ratio of the number of three-dimensional points corresponding to each target plane category in the multiple target plane categories among the M first three-dimensional points is used to obtain the confidence of the one or more target plane categories.
  • the embodiment of the present application does not limit the specific value of the third threshold, and it can be set as needed in the actual process.
  • the M first three-dimensional points determined from the first dense semantic map can be regarded as three-dimensional points belonging to any plane. Since the plane category to which each of the M first three-dimensional points belongs can be determined, the plane categories to which different three-dimensional points belong may be the same or different. For example, the plane category of three-dimensional point A among the M first three-dimensional points is "Ground", and the plane category of the three-dimensional point B among the M first three-dimensional points is "table", so the M first three-dimensional points can be obtained according to the plane category to which each of the M first three-dimensional points belongs Corresponding one or more plane categories.
  • the plane category of each three-dimensional point in the M first three-dimensional points may be the target plane category of the two-dimensional pixel points corresponding to the three-dimensional point mentioned in the previous embodiment.
  • step 3022 can be used to obtain the target plane category of each pixel and use it as the plane category of the three-dimensional point corresponding to each pixel, so that one or more target plane categories corresponding to the M first three-dimensional points can be obtained.
  • the plane category of N1 three-dimensional points among the M first three-dimensional points is "ground", that is, the number of three-dimensional points whose plane category is “ground” is N1
  • the plane category of N2 three-dimensional points is "table” "That is to say, the number of 3D points whose plane type is “table” is N2
  • the plane type to which N3 3D points belong is "wall", that is, the number of 3D points whose plane type is "wall” is N3, where N1+ N2+N3 is less than or equal to M, N1, N2, and N3 are positive integers
  • the probability that the number of 3D points included in the plane category "ground” is among the M first 3D points is: N1/M.
  • the probability of the number of three-dimensional points included in the plane category "table" in the M first three-dimensional points is: N2/M.
  • the probability that the number of three-dimensional points included in the plane category "wall" is among the M first three-dimensional points is: N3/M.
  • the confidences of one or more plane categories of any plane are: N1/M, N2/M, and N3/M. If N2/M>N1/M, and N2/M is greater than N3/M, then the semantic plane category of any plane is "ground".
  • Step 3043 The semantic clustering module 204 selects the target plane category with the highest confidence among the one or more target plane categories as the semantic plane category of any one of the planes.
  • the semantic clustering module 204 can determine the semantics of plane A
  • the plane category is ground.
  • any plane may correspond to one or multiple target plane categories, but not all target plane categories in the one or more target plane categories corresponding to any one plane have the same orientation as that of any plane, that is, a plane It may correspond to the target plane category that is consistent with the orientation of the plane, or it may correspond to the target plane category that is inconsistent with the orientation of the plane, and the target plane category that is inconsistent with the orientation of the plane has a lower probability of belonging to the semantic plane category of the plane. The probability of the plane category. Based on this, in order to simplify the subsequent calculation process and reduce the calculation error, in a possible implementation manner, the orientation of one or more target plane categories corresponding to any plane in the embodiment of the present application is consistent with the orientation of any plane.
  • the one or more target plane categories are plane categories selected by the image data processing device from all target plane categories corresponding to any one plane and consistent with the orientation of the any one plane.
  • the one or more target plane categories may be all plane categories of all target plane categories corresponding to any one plane, or may be part of the plane categories, which is not limited in the embodiment of the present application. All target plane categories corresponding to any plane in the embodiment of the present application can be regarded as all target plane categories corresponding to the M first three-dimensional points.
  • the plane a is facing downwards
  • the plane category (ground) is facing upwards
  • the plane category (table) is facing downwards
  • the plane category (ceiling) is facing downwards. Therefore, in the calculation plane a belongs to one or more planes
  • the confidence of the category the confidence that the plane a is determined to belong to the plane category (ground) can be eliminated. This not only reduces the calculation burden of the image data processing device, but also improves the calculation accuracy.
  • the semantic clustering module 204 counts the proportion of the number of three-dimensional points corresponding to each target plane category in the one or more target plane categories among the M first three-dimensional points, and obtains all the three-dimensional points.
  • the method provided in the embodiment of the present application further includes: the semantic clustering module 204 updates one or more corresponding planes according to at least one of the Bayes theorem or the voting mechanism. The confidence of each target plane category.
  • the semantic clustering module 204 performs plane fitting on the three-dimensional point cloud data of each three-dimensional point to obtain a plane equation.
  • A, B, C, D are the plane equation parameters that need to be solved, and the optimal plane equation parameters are solved through multiple points.
  • the specific fitting scheme can refer to the prior art.
  • the outermost point of all the points involved in the calculation will be the boundary point of the plane.
  • the semantic clustering module 204 counts and filters the M first three-dimensional points whose distance from the plane is less than the third threshold from the first dense semantic map based on the plane equation, orientation and area of the detected plane.
  • the first three-dimensional point corresponds to one or more target plane categories.
  • the semantic clustering module 204 regularizes the three-dimensional points of various plane categories in one or more target plane categories as the confidence of the plane category, that is, counts the number of three-dimensional points included in various target plane categories in all three-dimensional points (M first The ratio of the number of three-dimensional points. Based on Bayes' theorem and voting mechanism and the confidence of the last recorded various plane categories are updated, and the plane category with the highest current confidence is selected as the plane category of plane semantics, which can enhance the accuracy and stability of plane semantic recognition.
  • the semantic clustering module 204 specifically uses Bayes' theorem and voting mechanism to count the confidence that a plane calculated before the current moment belongs to multiple plane categories, so as to determine whether a plane calculated at the current moment belongs to multiple plane categories according to the obtained confidence. The confidence of each plane category is revised and updated.
  • the voting mechanism For example, suppose the maximum number of votes under the voting mechanism is MAX_VOTE_COUNT, and the initial number of votes is 0. If the plane category of a 3D point C in the current frame is consistent with the plane category of the 3D point C in the previous frame of the current frame, then the 3D The number of votes corresponding to point C is increased by 1, and the plane category probability prob to which the three-dimensional point C belongs is updated to slide between the average value and the maximum value of the two.
  • the number of votes is reduced by 1 and the plane category probability prob is updated to take 80% of its value.
  • step 304 can be specifically implemented as described in FIG. 9: step 901, the semantic clustering module 204 executes a plane detection step to obtain one or more planes included in the image data to be processed. Since the semantic clustering module 204 calculates the plane category of the plane semantics of each plane of one or more planes in the same manner and principle, the following steps take the process of calculating the plane category of the plane semantics of the first plane by the image data processing device as an example. It does not have an indicative meaning.
  • Step 902 The semantic clustering module 204 obtains the plane equation of the first plane.
  • Step 903 The semantic clustering module 204 calculates the area of the first plane.
  • Step 904 The semantic clustering module 204 calculates the orientation of the first plane.
  • Step 905 The semantic clustering module 204 counts the M three-dimensional points of the three-dimensional points of various plane categories in the first dense semantic map whose distance between the first plane is less than the third threshold.
  • Step 906 The semantic clustering module 204 determines whether the orientation of each target plane category in the one or more target plane categories corresponding to the M three-dimensional points is consistent or the same as the orientation of the first plane.
  • Step 907 If the orientation of the various target plane categories is consistent with the orientation of the first plane, the semantic clustering module 204 determines whether the number of three-dimensional points included in each target plane category in the unit plane meets the threshold according to the area of the first plane.
  • Step 908 If it is determined according to the plane area that the number of 3D points included in each target plane category in the unit plane meets the threshold, the semantic clustering module 204 performs regularization processing on the number of 3D points included in each target plane category, that is The proportion of the total number of three-dimensional points included in each target plane category in the M first three-dimensional points is calculated to obtain the confidence that the first plane belongs to one or more target plane categories.
  • Step 909 The semantic clustering module 204 updates the Bayesian probability of the currently calculated confidence that the first plane belongs to one or more target plane categories and the currently calculated confidence that the first plane belongs to various target plane categories.
  • Step 910 The semantic clustering module 204 uses the target plane category with the highest current confidence of the first plane as the plane category of the first plane.
  • the semantic clustering module 204 determines that the process stops. In addition, if the semantic clustering module 204 determines that the number of three-dimensional points included in the various target plane categories in the unit plane does not meet the threshold value according to the area of the first plane, the image data processing apparatus determines that the process stops.
  • the semantic segmentation module 202 performs an optimization operation on the semantic segmentation result according to the image data to be processed and the depth information of the image data to be processed, including the random sample consensus (RANSAC) as described in FIG. 10
  • RNSAC random sample consensus
  • Floor as an important part of the scene, has the following characteristics: Floor is a plane with a large area; Floor is an important reference for SLAM initialization; the ground is easier to detect and recognize than other semantic targets; objects in the scene Mostly located on the ground; the height of the objects in the scene is mostly relative to the ground. Therefore, it is very necessary to divide the ground first and obtain the plane equation.
  • the RANSAC algorithm is also known as the random sampling consensus estimation method. It is a robust estimation method, which is more suitable for the estimation of the plane with a large area such as the ground.
  • the result of semantic segmentation of the deep neural network is relied on here.
  • the ground semantic pixels are extracted from multiple three-dimensional points (FLOOR three-dimensional points) and obtain the point cloud data composed of the depth information to realize the ground equation estimation based on RANSAC. The specific steps are shown in Figure 10:
  • the ground equation can also be estimated by using AI.
  • Step 1011 The semantic segmentation module 202 obtains P three-dimensional points included in the ground by performing semantic segmentation processing on the ground.
  • the number of iterations of the RANSAC algorithm is M. If M>0, the image data processing device randomly selects l (for example, l is 3) three-dimensional points from the P three-dimensional points as sampling points. Otherwise, skip to step 1016 for execution.
  • SVD Singular Value Decomposition
  • Step 1013 The semantic segmentation module 202 brings the three-dimensional coordinates q of the P three-dimensional points into the estimated plane equation respectively, and obtains the scalar distance d from the P three-dimensional points to the plane equation. If d is less than the preset threshold ⁇ , Then P three-dimensional points are considered as interior points, and the number k of interior points is counted. in,
  • Step 1014 The semantic segmentation module 202 discriminates the size of the number k of interior points in this iteration and the optimal number of interior points K. If k ⁇ K, the semantic segmentation module 202 reduces the number of iterations M of the RANSAC algorithm by 1, and jumps to step 1011 execute, otherwise continue to execute.
  • Step 1016 The semantic segmentation module 202 uses K optimal interior points to re-estimate the plane equation, that is, establishes an overdetermined equation composed of K equations, and uses SVD to find the global optimal plane equation.
  • the semantic seed is combined with depth information to grow the segmentation area and modify the segmentation result.
  • the number of pixels in the semantic segmentation category is used as an indicator of the priority of region growth, to give priority to the category with a larger number of pixels in the semantic segmentation category for region growth, but the priority of the ground is the highest, that is, the ground is grown first before the region is grown. Regional growth for other categories.
  • the region growing algorithm relies on the degree of similarity between the seed points and their domain points to merge adjacent points with higher similarity and continue to grow outward until the neighboring points that do not meet the similarity condition are merged in.
  • a typical 8-neighborhood is selected for region growth, and the similarity condition is also selected for depth and color information to express, so that the under-segmented region can be better corrected.
  • the so-called seed point is the initial point of region growth
  • region growth uses a method similar to Breadth-First-Search (BFS) to spread out and grow. The specific steps are shown in Figure 11:
  • Step 1101 The semantic segmentation module 202 traverses the priority list of semantic segmentation categories, and pushes the plane category with high priority onto the stack (push into the seed point stack) for region growth.
  • the seed point stack of the current push category is That is, there are K seed points, and the coordinates of the two-dimensional pixel points corresponding to each seed point are (i, j).
  • the so-called priority list is the statistical segmentation result, which is established from more to less according to the number of each plane category.
  • Step 1102 If the seed point stack is not empty, the semantic segmentation module 202 pops the last seed point s K (i,j) from the stack and deletes it from the stack, and determines its neighboring point p(i+m,j+n ) Whether the category is other (OTHER), if it is to continue to execute, otherwise skip to step 1101 for execution.
  • Step 1103 The semantic segmentation module 202 compares the similarity distance d between the seed point s K and the neighboring point p. If the similarity distance d is less than the given threshold ⁇ , then continue to execute, otherwise jump to step 1101 for execution,
  • the expression of similarity distance d is as follows:
  • Step 1104 The semantic segmentation module 202 pushes the neighborhood point p satisfying the similarity condition into the seed point stack
  • the image data processing apparatus and the like include hardware structures and/or software modules corresponding to the respective functions.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
  • the embodiment of the present application may divide the functional units according to the foregoing method example image data processing apparatus.
  • each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit. It should be noted that the division of units in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
  • FIG. 2 shows a possible structural schematic diagram of the image data processing device involved in the foregoing embodiment.
  • the image data processing device includes: a semantic segmentation module 202, The semantic map module 203 and the semantic clustering module 204.
  • the semantic segmentation module 202 is used to support the image data processing apparatus to execute steps 301 and 302 in the above-mentioned embodiment.
  • the semantic map module 203 is used to support the image data processing device to execute step 303 in the foregoing embodiment.
  • the semantic clustering module 204 is used to support the image data processing apparatus to perform step 304 in the above-mentioned embodiment.
  • the semantic segmentation module 202 is further configured to support the image data processing apparatus to execute step 305 in the foregoing embodiment.
  • the semantic segmentation module 202 is used to support the image data processing device to execute step 3011 in the foregoing embodiment.
  • the semantic segmentation module 202 is used to support the image data processing device to execute step 306, step 3021, and step 3022 in the above-mentioned embodiment.
  • the semantic clustering module 204 is used to support the image data processing apparatus to perform step 3041, step 3042, and step 3043 in the foregoing embodiment.
  • the semantic clustering module 204 is also used to support the image data processing device to execute steps 901 to 910 in the foregoing embodiment.
  • the device can be implemented in the form of software and stored in a storage medium.
  • FIG. 13 shows a schematic diagram of a possible hardware structure of the image data processing device involved in the above-mentioned embodiment.
  • the image data processing device includes: a first processor 1301 and a second processor 1302.
  • the image data processing apparatus may further include a communication interface 1303, a memory 1304, and a bus 1305.
  • the communication interface 1303 may include an input interface 13031 and an output interface 13032.
  • the first processor 1301 and the second processor 1302 may be the processor 120 shown in FIG. 1.
  • the first processor 1301 may be a DSP or a CPU.
  • the second processor 1302 may be an NPU.
  • the communication interface 1303 may be the input device 140 in FIG. 1.
  • the memory 1304 is used to store program codes and data of the image data processing device, and corresponds to the memory 130 in FIG. 1.
  • the bus 1305 may be built in the processor 120 shown in FIG. 1.
  • the first processor 1301 and the second processor 1302 are configured to perform part of the functions in the image data processing method described above.
  • the first processor 1301 is configured to support the image data processing apparatus to execute step 301 of the foregoing embodiment.
  • the second processor 1302 is used for the image data processing apparatus to execute step 302 of the foregoing embodiment.
  • the first processor 1301 is used for the image data processing apparatus to execute step 303 and step 304 of the foregoing embodiment.
  • the first processor 1301 is further configured to support the image data processing apparatus to execute step 305, step 3011, step 3041, step 3042, step 3043 in the foregoing embodiment.
  • the second processor 1302 is also configured to support the image data processing apparatus to execute step 306, step 3021, and step 3022 in the foregoing embodiment.
  • the first processor 1301 is further configured to support the image data processing apparatus to execute steps 901 to 910 in the foregoing embodiment.
  • the first processor 1301 or the second processor 1302 may be a single-processor structure, a multi-processor structure, a single-threaded processor, a multi-threaded processor, etc.; in some feasible embodiments
  • the first processor 1301 may be a central processing unit, a general-purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof.
  • the second processor 1302 may be a neural network processor, which may implement or execute various exemplary logical blocks, modules, and circuits described in conjunction with the disclosure of this application.
  • the processor may also be a combination that implements computing functions, for example, a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and so on.
  • Output interface 13032 This output interface is used to output the processing result in the above-mentioned image data processing method.
  • the processing result can be directly output by the processor, or it can be stored in the memory first, and then passed through the memory. Output; in some feasible embodiments, there may be only one output interface, or there may be multiple output interfaces.
  • the processing result output by the output interface can be sent to the memory for storage, or sent to another processing flow to continue processing, or sent to the display device for display, or sent to the player terminal for playback. Wait.
  • the memory 1301 can store the aforementioned image data to be processed and related instructions for configuring the first processor or the second processor.
  • the memory may be a floppy disk, a hard disk such as a built-in hard disk and a mobile hard disk, a magnetic disk, an optical disk, a magneto-optical disk such as CD_ROM, DCD_ROM, non-volatile storage Devices such as RAM, ROM, PROM, EPROM, EEPROM, flash memory, or any other form of storage medium known in the technical field.
  • Bus 1304 This bus may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into an address bus, a data bus, a control bus, and so on. For ease of presentation, only one thick line is used in FIG. 13, but it does not mean that there is only one bus or one type of bus.
  • the embodiment of the present application also provides a computer-readable storage medium that stores instructions in the computer-readable storage medium, and when it runs on a device (for example, the device may be a single-chip microcomputer, a chip, a computer, etc.), The device executes one or more of steps 301 to 3011 of the above-mentioned image data processing method. If each component module of the above-mentioned image data processing device is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in the computer readable storage medium.
  • the embodiments of the present application also provide a computer program product containing instructions.
  • the technical solution of the present application is essentially or a part that contributes to the prior art, or all or part of the technical solution can be a software product.
  • the computer software product is stored in a storage medium and includes a number of instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) or a processor therein execute the various embodiments of the present application All or part of the steps of the method.
  • the computer program product includes one or more computer programs or instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, network equipment, user equipment, or other programmable devices.
  • the computer program or instruction may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer program or instruction may be downloaded from a website, computer, The server or data center transmits to another website site, computer, server or data center through wired or wireless means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center that integrates one or more available media.
  • the usable medium may be a magnetic medium, such as a floppy disk, a hard disk, and a magnetic tape; it may also be an optical medium, such as a digital video disc (digital video disc, DVD); and it may also be a semiconductor medium, such as a solid state drive (solid state drive). , SSD).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

L'invention divulgue un procédé d'identification de catégorie sémantique de plan et un appareil de traitement de données d'image, ceux-ci se rapportant au domaine technique du traitement d'image et étant utilisés pour déterminer avec précision une catégorie sémantique de plan. Le procédé consiste à : acquérir des données d'image à traiter, les données d'image à traiter comprenant N points de pixel ; déterminer un résultat de segmentation sémantique des données d'image à traiter, le résultat de segmentation sémantique comprenant des catégories de plan cible correspondant à au moins certains des N points de pixel ; en fonction du résultat de segmentation sémantique, obtenir une première carte sémantique dense, la première carte sémantique dense comprenant au moins une catégorie de plan cible correspondant à au moins un premier point tridimensionnel dans un premier nuage de points tridimensionnel et le ou les premiers points tridimensionnels correspondant à au moins un point de pixel dans les au moins certains points de pixel ; et effectuer une identification de catégorie sémantique de plan selon la première carte sémantique dense afin d'obtenir des catégories sémantiques de plan d'un ou de plusieurs plans compris dans les données d'image à traiter. Le procédé peut améliorer l'exactitude de la reconnaissance sémantique de plan.
PCT/CN2020/074040 2020-01-23 2020-01-23 Procédé d'identification de catégorie sémantique de plan et appareil de traitement de données d'image WO2021147113A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/074040 WO2021147113A1 (fr) 2020-01-23 2020-01-23 Procédé d'identification de catégorie sémantique de plan et appareil de traitement de données d'image
CN202080001308.1A CN113439275A (zh) 2020-01-23 2020-01-23 一种平面语义类别的识别方法以及图像数据处理装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/074040 WO2021147113A1 (fr) 2020-01-23 2020-01-23 Procédé d'identification de catégorie sémantique de plan et appareil de traitement de données d'image

Publications (1)

Publication Number Publication Date
WO2021147113A1 true WO2021147113A1 (fr) 2021-07-29

Family

ID=76992013

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/074040 WO2021147113A1 (fr) 2020-01-23 2020-01-23 Procédé d'identification de catégorie sémantique de plan et appareil de traitement de données d'image

Country Status (2)

Country Link
CN (1) CN113439275A (fr)
WO (1) WO2021147113A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4138390A1 (fr) * 2021-08-20 2023-02-22 Beijing Xiaomi Mobile Software Co., Ltd. Procédé de commande de caméra, processeur de signal d'images et dispositif avec contrôle temporal des paramètres d'acquisition d'images
WO2023051362A1 (fr) * 2021-09-30 2023-04-06 北京字跳网络技术有限公司 Procédé de traitement de zone d'image et dispositif
WO2023088177A1 (fr) * 2021-11-16 2023-05-25 华为技术有限公司 Procédé d'entrainement de modèle de réseau neuronal, et procédé et dispositif d'établissement de modèle tridimensionnel vectorisé

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114119998B (zh) * 2021-12-01 2023-04-18 成都理工大学 一种车载点云地面点提取方法及存储介质
CN115527028A (zh) * 2022-08-16 2022-12-27 北京百度网讯科技有限公司 地图数据处理方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145747A (zh) * 2018-07-20 2019-01-04 华中科技大学 一种水面全景图像语义分割方法
US20190287254A1 (en) * 2018-03-16 2019-09-19 Honda Motor Co., Ltd. Lidar noise removal using image pixel clusterings
CN110378349A (zh) * 2019-07-16 2019-10-25 北京航空航天大学青岛研究院 Android移动端室内场景三维重建及语义分割方法
CN110458805A (zh) * 2019-03-26 2019-11-15 华为技术有限公司 一种平面检测方法、计算设备以及电路系统
CN110633617A (zh) * 2018-06-25 2019-12-31 苹果公司 使用语义分割的平面检测

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190287254A1 (en) * 2018-03-16 2019-09-19 Honda Motor Co., Ltd. Lidar noise removal using image pixel clusterings
CN110633617A (zh) * 2018-06-25 2019-12-31 苹果公司 使用语义分割的平面检测
CN109145747A (zh) * 2018-07-20 2019-01-04 华中科技大学 一种水面全景图像语义分割方法
CN110458805A (zh) * 2019-03-26 2019-11-15 华为技术有限公司 一种平面检测方法、计算设备以及电路系统
CN110378349A (zh) * 2019-07-16 2019-10-25 北京航空航天大学青岛研究院 Android移动端室内场景三维重建及语义分割方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4138390A1 (fr) * 2021-08-20 2023-02-22 Beijing Xiaomi Mobile Software Co., Ltd. Procédé de commande de caméra, processeur de signal d'images et dispositif avec contrôle temporal des paramètres d'acquisition d'images
WO2023051362A1 (fr) * 2021-09-30 2023-04-06 北京字跳网络技术有限公司 Procédé de traitement de zone d'image et dispositif
WO2023088177A1 (fr) * 2021-11-16 2023-05-25 华为技术有限公司 Procédé d'entrainement de modèle de réseau neuronal, et procédé et dispositif d'établissement de modèle tridimensionnel vectorisé

Also Published As

Publication number Publication date
CN113439275A (zh) 2021-09-24

Similar Documents

Publication Publication Date Title
WO2021147113A1 (fr) Procédé d'identification de catégorie sémantique de plan et appareil de traitement de données d'image
US10198823B1 (en) Segmentation of object image data from background image data
EP3786892B1 (fr) Procédé, dispositif et appareil de repositionnement dans un processus de suivi d'orientation de caméra, et support d'informations
US20200160087A1 (en) Image based object detection
US10719759B2 (en) System for building a map and subsequent localization
CN112189335B (zh) 用于低功率移动平台的cmos辅助内向外动态视觉传感器跟踪
US9406137B2 (en) Robust tracking using point and line features
CN109934065B (zh) 一种用于手势识别的方法和装置
US20170013195A1 (en) Wearable information system having at least one camera
US20200117936A1 (en) Combinatorial shape regression for face alignment in images
WO2018049801A1 (fr) Procédé de détection de doigt heuristique sur la base d'une carte de profondeur
CN112889068A (zh) 用于图像处理的神经网络对象识别的方法和系统
US20240104744A1 (en) Real-time multi-view detection of objects in multi-camera environments
US10827125B2 (en) Electronic device for playing video based on movement information and operating method thereof
US20190096073A1 (en) Histogram and entropy-based texture detection
CN109493349B (zh) 一种图像特征处理模块、增强现实设备和角点检测方法
US11688094B1 (en) Method and system for map target tracking
US20240177329A1 (en) Scaling for depth estimation
US20230377182A1 (en) Augmented reality device for obtaining depth information and method of operating the same
US20230162375A1 (en) Method and system for improving target detection performance through dynamic learning
CN116576866B (zh) 导航方法和设备
US20240153245A1 (en) Hybrid system for feature detection and descriptor generation
WO2023102873A1 (fr) Techniques améliorées pour un suivi de pose tridimensionnelle en temps réel à plusieurs personnes à l'aide d'une caméra unique
WO2021179905A1 (fr) Descripteur de caractéristique d'image robuste à flou animé
WO2024112458A1 (fr) Mise à l'échelle pour une estimation de profondeur

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20915567

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20915567

Country of ref document: EP

Kind code of ref document: A1