CN117078519A - Noise reduction and obstacle detection method and device for depth image and mobile robot - Google Patents

Noise reduction and obstacle detection method and device for depth image and mobile robot Download PDF

Info

Publication number
CN117078519A
CN117078519A CN202210501030.5A CN202210501030A CN117078519A CN 117078519 A CN117078519 A CN 117078519A CN 202210501030 A CN202210501030 A CN 202210501030A CN 117078519 A CN117078519 A CN 117078519A
Authority
CN
China
Prior art keywords
depth image
image
obstacle detection
neural network
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210501030.5A
Other languages
Chinese (zh)
Inventor
吴伟
孙志雄
陈超
成波
李升波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jizhijia Technology Co Ltd
Original Assignee
Beijing Jizhijia Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jizhijia Technology Co Ltd filed Critical Beijing Jizhijia Technology Co Ltd
Priority to CN202210501030.5A priority Critical patent/CN117078519A/en
Publication of CN117078519A publication Critical patent/CN117078519A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Abstract

A method and a device for noise reduction and obstacle detection of a depth image and a mobile robot are provided, wherein the method for noise reduction of the depth image comprises the following steps: acquiring a multi-modal image acquired for the same scene, the multi-modal image comprising a depth image and at least one other modal image; respectively inputting the depth image and the at least one other mode image into a trained neural network, and predicting a noise region in the depth image by the neural network based on fusion of features of different mode images; and removing the noise area in the depth image based on the prediction result of the neural network to obtain and output an optimized depth image. According to the noise reduction method and device for the depth image, the noise areas on the depth image are identified through the trained neural network fusion depth image and other image sources, and the noise of the depth image can be reduced, so that the depth image is optimized.

Description

Noise reduction and obstacle detection method and device for depth image and mobile robot
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a method and apparatus for noise reduction and obstacle detection of a depth image, and a mobile robot.
Background
Currently, depth images acquired by a depth camera, especially indoor depth images, generate highlight areas under extreme illumination conditions, such as areas of direct light, ground reflection, high reflection materials, and the like, and these highlight areas are often accompanied by erroneous depth value information, so they are called noise areas. Such a depth image cannot accurately reflect depth information of the acquisition scene. Performing other operations based on such a depth map, such as performing obstacle detection, will easily lead to erroneous detection results and erroneous obstacle avoidance.
Accordingly, a noise reduction scheme of a depth image is required to solve the above-described problems.
Disclosure of Invention
According to an aspect of the present application, there is provided a noise reduction method of a depth image, the method including: acquiring a multi-modal image acquired for the same scene, the multi-modal image comprising a depth image and at least one other modal image; respectively inputting the depth image and the at least one other mode image into a trained neural network, and predicting a noise region in the depth image by the neural network based on fusion of features of different mode images; and removing the noise area in the depth image based on the prediction result of the neural network to obtain and output an optimized depth image.
In one embodiment of the application, the neural network utilizes a particular candidate anchor box in predicting the noise region, wherein the particular candidate anchor box is determined based on the source of the noise region in the depth image.
In one embodiment of the present application, when the source of the noise region in the depth image includes an artificial light source, the specific candidate anchor frame includes at least one of a circular candidate anchor frame, an elliptical candidate anchor frame, and a rectangular candidate anchor frame.
In one embodiment of the present application, the predicting, by the neural network, a noise region in the depth image based on a fusion of features of different modality images includes: extracting features of the depth image and the at least one other modality image by a backbone network module of the neural network; fusing the characteristics extracted by the main network module by a fusion network module of the neural network; and predicting a noise area in the depth image by a head network module of the neural network based on the features fused by the fusion network module.
In one embodiment of the application, the backbone network module comprises a first backbone network module and a second backbone network module, wherein: the first backbone network module is used for encoding the depth image; the second backbone network module is used for encoding the other mode images; the first backbone network module and the second backbone network module share weights.
In one embodiment of the application, the other modality image comprises at least one of an infrared image, a color image, and a gray scale image.
In one embodiment of the application, the scene comprises an indoor scene.
According to another aspect of the present application there is provided a noise reduction device for a depth image, the device comprising a memory and a processor, the memory having stored thereon a computer program for execution by the processor, which when executed by the processor causes the processor to perform a noise reduction method for a depth image as described above.
According to still another aspect of the present application, there is provided a method of detecting an obstacle, the method comprising: acquiring a depth image of a scene to be subjected to obstacle detection, wherein the depth image is an optimized depth image obtained according to the noise reduction method of the depth image; and performing obstacle detection on the scene based on the optimized depth image to obtain an obstacle detection result of the scene.
According to a further aspect of the present application there is provided an obstacle detection device comprising a memory and a processor, the memory having stored thereon a computer program for execution by the processor, which when executed by the processor causes the processor to perform the obstacle detection method as described above.
According to still another aspect of the present application, there is provided a mobile robot including an image acquisition device and an obstacle detection device, wherein: the image acquisition device is used for acquiring images aiming at a region to be moved of the mobile robot, wherein the images comprise depth images and at least one other mode image; the obstacle detection device is used for performing obstacle detection based on the image acquired by the image acquisition device, so as to be used for avoiding the obstacle in the moving process of the mobile robot, wherein the obstacle detection device comprises the obstacle detection device.
According to still another aspect of the present application, there is provided a storage medium having stored thereon a computer program to be executed by a processor, which when executed by the processor, causes the processor to perform the noise reduction method of a depth image or to perform the obstacle detection method as described above.
According to the noise reduction method and device for the depth image, the noise areas on the depth image are identified through the trained neural network fusing the depth image and other image sources, and the noise of the depth image can be reduced, so that the depth image is optimized. According to the obstacle detection method and device, the noise area on the depth image is identified through the trained neural network fusion depth image and other image sources, the noise area on the depth image is removed, and the optimized depth image is obtained and used for obstacle detection, so that the occurrence of the condition of false obstacle detection can be effectively reduced or even avoided, and the accuracy of obstacle detection is improved. According to the mobile robot disclosed by the embodiment of the application, the obstacle detection is performed based on the obstacle detection device, so that the occurrence of false obstacle detection can be effectively reduced or even avoided, the accuracy of obstacle detection is improved, and the accuracy of obstacle avoidance is improved.
Drawings
The above and other objects, features and advantages of the present application will become more apparent by describing embodiments of the present application in more detail with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and together with the embodiments of the application, and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.
Fig. 1 shows a schematic flow chart of a noise reduction method of a depth image according to an embodiment of the present application.
Fig. 2 shows a schematic block diagram of a noise reduction device of a depth image according to an embodiment of the present application.
Fig. 3 shows a schematic flow chart of an obstacle detection method according to an embodiment of the application.
Fig. 4 shows a schematic block diagram of the obstacle detecting apparatus according to the embodiment of the application.
Fig. 5 shows a schematic block diagram of a mobile robot according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, exemplary embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein. Based on the embodiments of the application described in the present application, all other embodiments that a person skilled in the art would have without inventive effort shall fall within the scope of the application.
First, a noise reduction method 100 of a depth image according to an embodiment of the present application is described with reference to fig. 1. As shown in fig. 1, the noise reduction method 100 of a depth image may include the steps of:
in step S110, a multi-modality image acquired for the same scene is acquired, the multi-modality image comprising a depth image and at least one other modality image.
In step S120, the depth image and at least one other modality image are respectively input to a trained neural network, and noise regions in the depth image are predicted by the neural network based on fusion of features of different modality images.
In step S130, noise regions in the depth image are removed based on the prediction result of the neural network, and an optimized depth image is obtained and output.
In an embodiment of the present application, depth images and other image sources are fused to identify erroneous information (noise regions) on the depth images. False information (i.e., false depth values) on depth images, particularly of indoor scenes, often results from highly reflective, highlight objects, including but not limited to lights on, floor tile reflections, highly reflective columns, and the like. These erroneous depth values, in turn, tend to differ significantly from surrounding depth values (extremely far, creating holes in the depth image, or extremely near, creating peaks in the depth image). The intensity values of these regions are extremely strong on images of other modalities, such as infrared images, color images, or gray scale images. Thus, this information can be well modeled by the deep neural network, so that these regions are detected using the multi-modal deep neural network.
Based on this, in the embodiment of the application, the neural network can be trained by marking different mode images of the same scene as sample images. Specifically, for example, a depth image and other modality images (such as at least one of an infrared image, a color image or a gray image) of the scene a are collected, a noise region in the depth image is labeled, and a region with extremely strong intensity value corresponding to the noise region in the other modality images is labeled; similarly, the respective depth images and other modal images of other scenes such as the scene B, C, D … … are collected and respectively noted; finally, the marked images are used as sample images to be input into a neural network for training, noise areas in the depth images are predicted after the neural network fuses the characteristics of the depth images of the same scene and images of other modes, parameters of the neural network are continuously optimized according to the difference between the output (predicted result) and marking (real result) of the neural network, and when the difference meets the preset convergence condition, the trained neural network is obtained.
Based on the trained neural network, in practical application, acquiring a multi-mode image (i.e. images of multiple modes) aiming at the same scene, wherein the acquired image comprises at least one other mode image such as an infrared image, a color image or a gray level image besides a depth image (depending on which mode image or modes are adopted in training, the mode of the image acquired in practical application is consistent with the mode of a sample image adopted in training); and inputting images of different modes into the trained neural network, respectively extracting features from the images of different modes by the neural network, carrying out feature fusion, predicting a noise region in the depth image according to the fused features, and removing the noise region to obtain a noise reduction result of the depth image, namely the optimized depth image. Therefore, the noise reduction method for the depth image can be used for identifying the noise area on the depth image through the trained neural network fusing the depth image and other image sources, so that the noise of the depth image can be reduced, and the depth image can be optimized.
In an embodiment of the present application, the neural network employed in step S120 may include a backbone network module (backbone), a converged network module, and a head network module (head). Based on this, predicting noise regions in the depth image by the neural network based on fusion of features of different modality images in step S120 may include: extracting features of the depth image and at least one other mode image by a backbone network module of the neural network; fusing the characteristics extracted by the main network module by a fusion network module of the neural network; and predicting, by a head network module of the neural network, a noise region in the depth image based on the features fused by the fusion network module.
Wherein, in one example, the backbone network module of the neural network may further include a first backbone network module and a second backbone network module, wherein: the first backbone network module is used for encoding the depth image to obtain the characteristics extracted from the depth image; the second backbone network module is used for encoding the images of other modes so as to obtain the characteristics extracted from the images of other modes; the first backbone network module and the second backbone network module share weights, i.e. the weights of the two backbone network modules remain consistent. And different backbone network modules sharing weights are adopted to respectively encode the depth image and the images of other modes, so that the encoding efficiency can be improved. In other examples, the same backbone network module may also be employed to encode the depth image and the other modality images, respectively, to derive their respective features.
In the embodiment of the application, the backbone network module inputs the two data codes to the fusion network module for fusion after finishing the encoding. When training the neural network, the fusion network module can limit that the two information sources must contain information with larger difference (similarity is smaller) so as to ensure that the two information sources can provide complementary information, and meanwhile, due to the conductivity of a cosine function, the neural network can be trained end-to-end (end-to-end).
In an embodiment of the present application, the (head network module of the) neural network uses a specific candidate anchor box (bounding box) when predicting the noise region of the depth image, wherein the specific candidate anchor box is determined based on the source of the noise region in the depth image. In this embodiment, instead of simply adopting a conventional rectangular frame as a detection candidate anchor frame, a specific candidate anchor frame is determined according to the source characteristics of the noise region in the depth image, so that it is more advantageous to accurately detect the noise region in the depth image.
In one example, the source of the noise region in the depth image includes an artificial light source (i.e., the scene of the depth image may be an indoor scene), and since the shape of the high-reflection region (noise region) generated by the artificial light source is mostly biased to be circular, elliptical or rectangular, the candidate anchor frame of the neural network may be specified as at least one of a circular candidate anchor frame, an elliptical candidate anchor frame and a rectangular candidate anchor frame by using such a priori knowledge, so that the detection accuracy and the detection precision of the noise region in the depth image are significantly improved.
The noise reduction method 100 of the depth image according to the embodiment of the present application is exemplarily described above. Based on the above description, the noise reduction method 100 for a depth image according to an embodiment of the present application can reduce noise of the depth image by fusing the depth image with a trained neural network and identifying noise areas on the depth image from other image sources, thereby optimizing the depth image.
The noise reduction device 200 for a depth image according to another aspect of the present application is described below with reference to fig. 2. Fig. 2 shows a schematic block diagram of a noise reduction device 200 for a depth image according to an embodiment of the present application. As shown in fig. 2, the noise reduction device 200 for a depth image according to an embodiment of the present application may include a memory 210 and a processor 220, where the memory 210 stores a computer program executed by the processor 220, and the computer program when executed by the processor 220 causes the processor 220 to perform the foregoing noise reduction method 100 for a depth image according to an embodiment of the present application. Those skilled in the art can understand the specific operation of the noise reduction device 200 for depth image according to the embodiment of the present application in combination with the foregoing descriptions, and for brevity, specific details are not repeated herein, only some of the main operations of the processor 220 are described.
In one embodiment of the application, the computer program, when executed by the processor 220, causes the processor 220 to perform the steps of: acquiring a multi-modal image acquired for the same scene, wherein the multi-modal image comprises a depth image and at least one other modal image; respectively inputting the depth image and at least one other mode image into a trained neural network, and predicting noise areas in the depth image by the neural network based on fusion of features of different mode images; and removing a noise region in the depth image based on a prediction result of the neural network to obtain and output an optimized depth image.
In one embodiment of the application, the neural network utilizes a particular candidate anchor box in predicting the noise region, wherein the particular candidate anchor box is determined based on the source of the noise region in the depth image.
In one embodiment of the present application, when the source of the noise region in the depth image includes an artificial light source, the specific candidate anchor frame includes at least one of a circular candidate anchor frame, an elliptical candidate anchor frame, and a rectangular candidate anchor frame.
In one embodiment of the present application, the computer program, when executed by the processor 220, causes the processor 220 to execute predicting noise regions in depth images based on fusion of features of images of different modalities by a neural network, comprising: extracting features of the depth image and at least one other mode image by a backbone network module of the neural network; fusing the characteristics extracted by the main network module by a fusion network module of the neural network; and predicting, by a head network module of the neural network, a noise region in the depth image based on the features fused by the fusion network module.
In one embodiment of the application, the backbone network module comprises a first backbone network module and a second backbone network module, wherein: the first backbone network module is used for encoding the depth image; the second backbone network module is used for encoding other mode images; the first backbone network module and the second backbone network module share weights.
In one embodiment of the application, the other modality image includes at least one of an infrared image, a color image, and a gray scale image.
In one embodiment of the application, the scene comprises an indoor scene.
Based on the above description, the noise reduction device 200 for a depth image according to an embodiment of the present application can reduce noise of the depth image by fusing the depth image with a trained neural network and identifying noise areas on the depth image from other image sources, thereby optimizing the depth image.
An obstacle detection method provided according to another aspect of the present application is described below with reference to fig. 3. Fig. 3 shows a schematic flow chart of an obstacle detection method 300 according to an embodiment of the application. As shown in fig. 3, the obstacle detection method 300 may include the steps of:
in step S310, a multi-modality image acquired for the same scene is acquired, the multi-modality image comprising a depth image and at least one other modality image.
In step S320, the depth image and at least one other mode image are respectively input to a trained neural network, and noise areas in the depth image are predicted by the neural network based on fusion of features of different mode images;
in step S330, noise areas in the depth image are removed based on the prediction result of the neural network, so as to obtain an optimized depth image;
in step S340, the scene is subjected to obstacle detection based on the optimized depth image, and an obstacle detection result of the scene is obtained.
In an embodiment of the present application, a depth image and other image sources are fused to identify erroneous information (noise regions) on the depth image, and an optimized depth image (the same as the noise reduction method 100 for the depth image described above) is obtained by removing the noise regions, and then obstacle detection is performed based on the optimized depth image. Since false information (i.e., false depth values) on depth images, particularly of indoor scenes, often results from highly reflective, highlight objects, including but not limited to lights on, floor tile reflections, highly reflective columns, and the like. These erroneous depth values, in turn, tend to differ significantly from surrounding depth values (extremely far, creating holes in the depth image, or extremely near, creating peaks in the depth image). The intensity values of these regions are extremely strong on images of other modalities, such as infrared images, color images, or gray scale images. Thus, this information can be well modeled by the deep neural network, so that these regions are detected using the multi-modal deep neural network. After the detected noise areas are removed, an optimized depth image can be obtained, and obstacle detection based on the depth image can effectively reduce or even avoid the occurrence of false obstacle detection, so that the accuracy of obstacle detection is improved.
In an embodiment of the present application, the neural network employed in step S320 may include a backbone network module (backbone), a converged network module, and a head network module (head). Based on this, predicting noise regions in the depth image by the neural network based on the fusion of features of the different modality images in step S320 may include: extracting features of the depth image and at least one other mode image by a backbone network module of the neural network; fusing the characteristics extracted by the main network module by a fusion network module of the neural network; and predicting, by a head network module of the neural network, a noise region in the depth image based on the features fused by the fusion network module.
Wherein, in one example, the backbone network module of the neural network may further include a first backbone network module and a second backbone network module, wherein: the first backbone network module is used for encoding the depth image to obtain the characteristics extracted from the depth image; the second backbone network module is used for encoding the images of other modes so as to obtain the characteristics extracted from the images of other modes; the first backbone network module and the second backbone network module share weights, i.e. the weights of the two backbone network modules remain consistent. And different backbone network modules sharing weights are adopted to respectively encode the depth image and the images of other modes, so that the encoding efficiency can be improved. In other examples, the same backbone network module may also be employed to encode the depth image and the other modality images, respectively, to derive their respective features.
In the embodiment of the application, the backbone network module inputs the two data codes to the fusion network module for fusion after finishing the encoding. When training the neural network, the fusion network module can limit that the two information sources must contain information with larger difference (similarity is smaller) so as to ensure that the two information sources can provide complementary information, and meanwhile, due to the conductivity of a cosine function, the neural network can be trained end-to-end (end-to-end).
In an embodiment of the application, the (head network module of the) neural network utilizes a specific candidate anchor box when predicting the noise region of the depth image, wherein the specific candidate anchor box is determined based on the source of the noise region in the depth image. In this embodiment, instead of simply adopting a conventional rectangular frame as a detection candidate anchor frame, a specific candidate anchor frame is determined according to the source characteristics of the noise region in the depth image, so that it is more advantageous to accurately detect the noise region in the depth image.
In one example, the source of the noise region in the depth image includes an artificial light source (i.e., the scene of the depth image may be an indoor scene), and since the shape of the high-reflection region (noise region) generated by the artificial light source is mostly biased to be circular, elliptical or rectangular, the candidate anchor frame of the neural network may be specified as at least one of a circular candidate anchor frame, an elliptical candidate anchor frame and a rectangular candidate anchor frame by using such a priori knowledge, so that the detection accuracy and the detection precision of the noise region in the depth image are significantly improved.
In one example, based on the optimized depth image, the pose and size of the obstacle in the scene in three-dimensional space can be acquired in combination with the internal and external parameters of the camera capturing the depth image, thereby obtaining the obstacle detection result.
The obstacle detection method 300 according to the embodiment of the present application is exemplarily described above. Based on the above description, the obstacle detection method 300 according to the embodiment of the application identifies the noise area on the depth image by fusing the trained neural network with the depth image and other image sources, and removes the noise area on the depth image, so as to obtain an optimized depth image for obstacle detection, which can effectively reduce or even avoid the occurrence of false obstacle detection, thereby improving the accuracy of obstacle detection.
An obstacle detecting apparatus 400 according to another aspect of the present application is described below with reference to fig. 4. Fig. 4 shows a schematic block diagram of the obstacle detecting apparatus 400 according to the embodiment of the present application. As shown in fig. 4, the obstacle detecting apparatus 400 according to an embodiment of the application may include a memory 410 and a processor 420, the memory 410 storing a computer program executed by the processor 420, which when executed by the processor 420, causes the processor 420 to perform the obstacle detecting method 300 according to the embodiment of the application described above. Those skilled in the art can understand the specific operation of the obstacle detecting device 400 according to the embodiment of the application in combination with the foregoing, and for brevity, the description is omitted here.
A mobile robot 500 provided in accordance with still another aspect of the present application is described below in conjunction with fig. 5. Fig. 5 shows a schematic block diagram of a mobile robot 500 according to an embodiment of the present application. As shown in fig. 5, a mobile robot 500 according to an embodiment of the present application may include an image acquisition device 510 and an obstacle detection device 520. The image acquisition device 510 is configured to acquire an image for an area to be moved of the mobile robot 500, where the image includes a depth image and at least one other modality image. The obstacle detection device 520 is configured to perform obstacle detection based on the image acquired by the image acquisition device 510, so as to avoid an obstacle during the movement of the mobile robot 500, where the obstacle detection device 520 may be the aforementioned obstacle detection device 400. Those skilled in the art will understand that the specific operation of the obstacle detecting device 520 is not described in detail herein for brevity.
In an embodiment of the present application, the image capturing device 510 of the mobile robot 500 captures a multi-modal image for the same scene, where the captured image includes at least one other modal image, such as an infrared image, a color image, or a grayscale image, in addition to the depth image. Here, the image pickup device 510 of the mobile robot 500 may include a camera capable of providing images of multiple modalities, or may include different cameras to provide images of different modalities. The images of different modes are input into the obstacle detection device 520, the obstacle detection device 520 extracts features of the images of different modes respectively and performs feature fusion, noise areas in the depth images are predicted according to the fused features, the noise areas are removed to obtain optimized depth images, obstacle detection is performed based on the optimized depth images, and obstacle avoidance is performed according to obstacle detection results during movement.
As previously described, since false information (i.e., false depth values) on depth images, particularly of indoor scenes, is often caused by highly reflective, highlight objects, including but not limited to lights on, ground tile reflections, highly reflective columns, and the like. These erroneous depth values, in turn, tend to differ significantly from surrounding depth values (extremely far, creating holes in the depth image, or extremely near, creating peaks in the depth image). The intensity values of these regions are extremely strong on images of other modalities, such as infrared images, color images, or gray scale images. Thus, this information can be well modeled by the deep neural network, so that these regions are detected using the multi-modal deep neural network (included in the obstacle detection device 520). After the detected noise areas are removed, an optimized depth image can be obtained, the occurrence of false obstacle detection can be effectively reduced or even avoided through obstacle detection based on the depth image, and the accuracy of obstacle detection is improved, so that the accuracy of obstacle avoidance is improved.
Furthermore, according to an embodiment of the present application, there is also provided a storage medium on which program instructions are stored, which program instructions, when executed by a computer or a processor, are for performing the respective steps of the noise reduction method or the obstacle detection method of a depth image of an embodiment of the present application. The storage medium may include, for example, a memory card of a smart phone, a memory component of a tablet computer, a hard disk of a personal computer, read-only memory (ROM), erasable programmable read-only memory (EPROM), portable compact disc read-only memory (CD-ROM), USB memory, or any combination of the foregoing storage media. The computer-readable storage medium may be any combination of one or more computer-readable storage media.
Based on the above description, the noise reduction method and device for the depth image according to the embodiment of the application can be used for reducing noise of the depth image by fusing the depth image with other image sources through the trained neural network to identify the noise area on the depth image, so that the depth image is optimized. According to the obstacle detection method and device, the noise area on the depth image is identified through the trained neural network fusion depth image and other image sources, the noise area on the depth image is removed, and the optimized depth image is obtained and used for obstacle detection, so that the occurrence of the condition of false obstacle detection can be effectively reduced or even avoided, and the accuracy of obstacle detection is improved. According to the mobile robot disclosed by the embodiment of the application, the obstacle detection is performed based on the obstacle detection device, so that the occurrence of false obstacle detection can be effectively reduced or even avoided, the accuracy of obstacle detection is improved, and the accuracy of obstacle avoidance is improved.
Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the above illustrative embodiments are merely illustrative and are not intended to limit the scope of the present application thereto. Various changes and modifications may be made therein by one of ordinary skill in the art without departing from the scope and spirit of the application. All such changes and modifications are intended to be included within the scope of the present application as set forth in the appended claims.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described device embodiments are merely illustrative, e.g., the division of the elements is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple elements or components may be combined or integrated into another device, or some features may be omitted or not performed.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in order to streamline the application and aid in understanding one or more of the various inventive aspects, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof in the description of exemplary embodiments of the application. However, the method of the present application should not be construed as reflecting the following intent: i.e., the claimed application requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.
It will be understood by those skilled in the art that all of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be combined in any combination, except combinations where the features are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
Various component embodiments of the application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functions of some of the modules according to embodiments of the present application may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present application can also be implemented as an apparatus program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present application may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.
The foregoing description is merely illustrative of specific embodiments of the present application and the scope of the present application is not limited thereto, and any person skilled in the art can easily think about variations or substitutions within the scope of the present application. The protection scope of the application is subject to the protection scope of the claims.

Claims (12)

1. A method of denoising a depth image, the method comprising:
acquiring a multi-modal image acquired for the same scene, the multi-modal image comprising a depth image and at least one other modal image;
respectively inputting the depth image and the at least one other mode image into a trained neural network, and predicting a noise region in the depth image by the neural network based on fusion of features of different mode images;
and removing the noise area in the depth image based on the prediction result of the neural network to obtain and output an optimized depth image.
2. The method of claim 1, wherein the neural network utilizes a particular candidate anchor box in predicting the noise region, wherein the particular candidate anchor box is determined based on a source of the noise region in the depth image.
3. The method of claim 2, wherein when the source of the noise region in the depth image comprises an artificial light source, the particular candidate anchor frame comprises at least one of a circular candidate anchor frame, an elliptical candidate anchor frame, a rectangular candidate anchor frame.
4. A method according to any of claims 1-3, wherein predicting, by the neural network, noise regions in the depth image based on a fusion of features of different modality images, comprises:
extracting features of the depth image and the at least one other modality image by a backbone network module of the neural network;
fusing the characteristics extracted by the main network module by a fusion network module of the neural network;
and predicting a noise area in the depth image by a head network module of the neural network based on the features fused by the fusion network module.
5. The method of claim 4, wherein the backbone network module comprises a first backbone network module and a second backbone network module, wherein:
the first backbone network module is used for encoding the depth image;
the second backbone network module is used for encoding the other mode images;
the first backbone network module and the second backbone network module share weights.
6. A method according to any of claims 1-3, wherein the other modality image comprises at least one of an infrared image, a color image and a gray scale image.
7. A method according to any of claims 1-3, wherein the scene comprises an indoor scene.
8. A noise reduction device for a depth image, characterized in that the device comprises a memory and a processor, the memory having stored thereon a computer program to be run by the processor, which computer program, when run by the processor, causes the processor to perform the noise reduction method for a depth image according to any of claims 1-7.
9. A method of detecting an obstacle, the method comprising:
acquiring a depth image of a scene to be subjected to obstacle detection, wherein the depth image is an optimized depth image obtained according to the noise reduction method of the depth image of any one of claims 1 to 7;
and performing obstacle detection on the scene based on the optimized depth image to obtain an obstacle detection result of the scene.
10. An obstacle detection device, characterized in that the device comprises a memory and a processor, the memory having stored thereon a computer program to be run by the processor, which, when run by the processor, causes the processor to perform the obstacle detection method as claimed in claim 9.
11. A mobile robot comprising an image acquisition device and an obstacle detection device, wherein:
the image acquisition device is used for acquiring images aiming at a region to be moved of the mobile robot, wherein the images comprise depth images and at least one other mode image;
the obstacle detection device is used for performing obstacle detection based on the image acquired by the image acquisition device and used for avoiding obstacles in the moving process of the mobile robot, wherein the obstacle detection device comprises the obstacle detection device of claim 10.
12. A storage medium having stored thereon a computer program to be run by a processor, which computer program, when run by the processor, causes the processor to perform the method of noise reduction of a depth image according to any one of claims 1-7 or to perform the method of obstacle detection according to claim 9.
CN202210501030.5A 2022-05-09 2022-05-09 Noise reduction and obstacle detection method and device for depth image and mobile robot Pending CN117078519A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210501030.5A CN117078519A (en) 2022-05-09 2022-05-09 Noise reduction and obstacle detection method and device for depth image and mobile robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210501030.5A CN117078519A (en) 2022-05-09 2022-05-09 Noise reduction and obstacle detection method and device for depth image and mobile robot

Publications (1)

Publication Number Publication Date
CN117078519A true CN117078519A (en) 2023-11-17

Family

ID=88718103

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210501030.5A Pending CN117078519A (en) 2022-05-09 2022-05-09 Noise reduction and obstacle detection method and device for depth image and mobile robot

Country Status (1)

Country Link
CN (1) CN117078519A (en)

Similar Documents

Publication Publication Date Title
Yang et al. Where is my mirror?
JP5950973B2 (en) Method, apparatus and system for selecting a frame
US9235902B2 (en) Image-based crack quantification
JP4855556B1 (en) Moving object detection apparatus, moving object detection method, moving object detection program, moving object tracking apparatus, moving object tracking method, and moving object tracking program
US9679384B2 (en) Method of detecting and describing features from an intensity image
US9311524B2 (en) Image processing apparatus and image processing method
CN111222395A (en) Target detection method and device and electronic equipment
CN101834986B (en) Imaging apparatus, mobile body detecting method, mobile body detecting circuit and program
KR20150079730A (en) Systems and methods of merging multiple maps for computer vision based tracking
CN108009466B (en) Pedestrian detection method and device
EP3114687B1 (en) Method and device for processing a picture
JP4533836B2 (en) Fluctuating region detection apparatus and method
KR20160115932A (en) Dynamically updating a feature database that contains features corresponding to a known target object
JP2008286725A (en) Person detector and detection method
KR20110064622A (en) 3d edge extracting method and apparatus using tof camera
KR101997048B1 (en) Method for recognizing distant multiple codes for logistics management and code recognizing apparatus using the same
Yan et al. Towards automated detection and quantification of concrete cracks using integrated images and lidar data from unmanned aerial vehicles
CN110991489A (en) Driving data labeling method, device and system
US20170278258A1 (en) Method Of Detecting And Describing Features From An Intensity Image
US20150116543A1 (en) Information processing apparatus, information processing method, and storage medium
JP6967056B2 (en) Alignment-free video change detection with deep blind image region prediction
JP2018169341A (en) Oil film detection system and oil film detection method
CN109447000B (en) Living body detection method, stain detection method, electronic apparatus, and recording medium
CN117078519A (en) Noise reduction and obstacle detection method and device for depth image and mobile robot
JP4550768B2 (en) Image detection method and image detection apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination