CN117078519A

CN117078519A - Noise reduction and obstacle detection method and device for depth image and mobile robot

Info

Publication number: CN117078519A
Application number: CN202210501030.5A
Authority: CN
Inventors: 吴伟; 孙志雄; 陈超; 成波; 李升波
Original assignee: Beijing Jizhijia Technology Co Ltd
Current assignee: Beijing Jizhijia Technology Co Ltd
Priority date: 2022-05-09
Filing date: 2022-05-09
Publication date: 2023-11-17

Abstract

A method and a device for noise reduction and obstacle detection of a depth image and a mobile robot are provided, wherein the method for noise reduction of the depth image comprises the following steps: acquiring a multi-modal image acquired for the same scene, the multi-modal image comprising a depth image and at least one other modal image; respectively inputting the depth image and the at least one other mode image into a trained neural network, and predicting a noise region in the depth image by the neural network based on fusion of features of different mode images; and removing the noise area in the depth image based on the prediction result of the neural network to obtain and output an optimized depth image. According to the noise reduction method and device for the depth image, the noise areas on the depth image are identified through the trained neural network fusion depth image and other image sources, and the noise of the depth image can be reduced, so that the depth image is optimized.

Description

Noise reduction and obstacle detection method and device for depth image and mobile robot

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and apparatus for noise reduction and obstacle detection of a depth image, and a mobile robot.

Background

Currently, depth images acquired by a depth camera, especially indoor depth images, generate highlight areas under extreme illumination conditions, such as areas of direct light, ground reflection, high reflection materials, and the like, and these highlight areas are often accompanied by erroneous depth value information, so they are called noise areas. Such a depth image cannot accurately reflect depth information of the acquisition scene. Performing other operations based on such a depth map, such as performing obstacle detection, will easily lead to erroneous detection results and erroneous obstacle avoidance.

Accordingly, a noise reduction scheme of a depth image is required to solve the above-described problems.

Disclosure of Invention

According to an aspect of the present application, there is provided a noise reduction method of a depth image, the method including: acquiring a multi-modal image acquired for the same scene, the multi-modal image comprising a depth image and at least one other modal image; respectively inputting the depth image and the at least one other mode image into a trained neural network, and predicting a noise region in the depth image by the neural network based on fusion of features of different mode images; and removing the noise area in the depth image based on the prediction result of the neural network to obtain and output an optimized depth image.

In one embodiment of the application, the neural network utilizes a particular candidate anchor box in predicting the noise region, wherein the particular candidate anchor box is determined based on the source of the noise region in the depth image.

In one embodiment of the present application, when the source of the noise region in the depth image includes an artificial light source, the specific candidate anchor frame includes at least one of a circular candidate anchor frame, an elliptical candidate anchor frame, and a rectangular candidate anchor frame.

In one embodiment of the present application, the predicting, by the neural network, a noise region in the depth image based on a fusion of features of different modality images includes: extracting features of the depth image and the at least one other modality image by a backbone network module of the neural network; fusing the characteristics extracted by the main network module by a fusion network module of the neural network; and predicting a noise area in the depth image by a head network module of the neural network based on the features fused by the fusion network module.

In one embodiment of the application, the backbone network module comprises a first backbone network module and a second backbone network module, wherein: the first backbone network module is used for encoding the depth image; the second backbone network module is used for encoding the other mode images; the first backbone network module and the second backbone network module share weights.

In one embodiment of the application, the other modality image comprises at least one of an infrared image, a color image, and a gray scale image.

In one embodiment of the application, the scene comprises an indoor scene.

According to another aspect of the present application there is provided a noise reduction device for a depth image, the device comprising a memory and a processor, the memory having stored thereon a computer program for execution by the processor, which when executed by the processor causes the processor to perform a noise reduction method for a depth image as described above.

According to still another aspect of the present application, there is provided a method of detecting an obstacle, the method comprising: acquiring a depth image of a scene to be subjected to obstacle detection, wherein the depth image is an optimized depth image obtained according to the noise reduction method of the depth image; and performing obstacle detection on the scene based on the optimized depth image to obtain an obstacle detection result of the scene.

According to a further aspect of the present application there is provided an obstacle detection device comprising a memory and a processor, the memory having stored thereon a computer program for execution by the processor, which when executed by the processor causes the processor to perform the obstacle detection method as described above.

According to still another aspect of the present application, there is provided a mobile robot including an image acquisition device and an obstacle detection device, wherein: the image acquisition device is used for acquiring images aiming at a region to be moved of the mobile robot, wherein the images comprise depth images and at least one other mode image; the obstacle detection device is used for performing obstacle detection based on the image acquired by the image acquisition device, so as to be used for avoiding the obstacle in the moving process of the mobile robot, wherein the obstacle detection device comprises the obstacle detection device.

According to still another aspect of the present application, there is provided a storage medium having stored thereon a computer program to be executed by a processor, which when executed by the processor, causes the processor to perform the noise reduction method of a depth image or to perform the obstacle detection method as described above.

According to the noise reduction method and device for the depth image, the noise areas on the depth image are identified through the trained neural network fusing the depth image and other image sources, and the noise of the depth image can be reduced, so that the depth image is optimized. According to the obstacle detection method and device, the noise area on the depth image is identified through the trained neural network fusion depth image and other image sources, the noise area on the depth image is removed, and the optimized depth image is obtained and used for obstacle detection, so that the occurrence of the condition of false obstacle detection can be effectively reduced or even avoided, and the accuracy of obstacle detection is improved. According to the mobile robot disclosed by the embodiment of the application, the obstacle detection is performed based on the obstacle detection device, so that the occurrence of false obstacle detection can be effectively reduced or even avoided, the accuracy of obstacle detection is improved, and the accuracy of obstacle avoidance is improved.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing embodiments of the present application in more detail with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and together with the embodiments of the application, and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.

Fig. 1 shows a schematic flow chart of a noise reduction method of a depth image according to an embodiment of the present application.

Fig. 2 shows a schematic block diagram of a noise reduction device of a depth image according to an embodiment of the present application.

Fig. 3 shows a schematic flow chart of an obstacle detection method according to an embodiment of the application.

Fig. 4 shows a schematic block diagram of the obstacle detecting apparatus according to the embodiment of the application.

Fig. 5 shows a schematic block diagram of a mobile robot according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, exemplary embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein. Based on the embodiments of the application described in the present application, all other embodiments that a person skilled in the art would have without inventive effort shall fall within the scope of the application.

First, a noise reduction method 100 of a depth image according to an embodiment of the present application is described with reference to fig. 1. As shown in fig. 1, the noise reduction method 100 of a depth image may include the steps of:

in step S110, a multi-modality image acquired for the same scene is acquired, the multi-modality image comprising a depth image and at least one other modality image.

In step S120, the depth image and at least one other modality image are respectively input to a trained neural network, and noise regions in the depth image are predicted by the neural network based on fusion of features of different modality images.

In step S130, noise regions in the depth image are removed based on the prediction result of the neural network, and an optimized depth image is obtained and output.

In an embodiment of the present application, depth images and other image sources are fused to identify erroneous information (noise regions) on the depth images. False information (i.e., false depth values) on depth images, particularly of indoor scenes, often results from highly reflective, highlight objects, including but not limited to lights on, floor tile reflections, highly reflective columns, and the like. These erroneous depth values, in turn, tend to differ significantly from surrounding depth values (extremely far, creating holes in the depth image, or extremely near, creating peaks in the depth image). The intensity values of these regions are extremely strong on images of other modalities, such as infrared images, color images, or gray scale images. Thus, this information can be well modeled by the deep neural network, so that these regions are detected using the multi-modal deep neural network.

Based on this, in the embodiment of the application, the neural network can be trained by marking different mode images of the same scene as sample images. Specifically, for example, a depth image and other modality images (such as at least one of an infrared image, a color image or a gray image) of the scene a are collected, a noise region in the depth image is labeled, and a region with extremely strong intensity value corresponding to the noise region in the other modality images is labeled; similarly, the respective depth images and other modal images of other scenes such as the scene B, C, D … … are collected and respectively noted; finally, the marked images are used as sample images to be input into a neural network for training, noise areas in the depth images are predicted after the neural network fuses the characteristics of the depth images of the same scene and images of other modes, parameters of the neural network are continuously optimized according to the difference between the output (predicted result) and marking (real result) of the neural network, and when the difference meets the preset convergence condition, the trained neural network is obtained.

Based on the trained neural network, in practical application, acquiring a multi-mode image (i.e. images of multiple modes) aiming at the same scene, wherein the acquired image comprises at least one other mode image such as an infrared image, a color image or a gray level image besides a depth image (depending on which mode image or modes are adopted in training, the mode of the image acquired in practical application is consistent with the mode of a sample image adopted in training); and inputting images of different modes into the trained neural network, respectively extracting features from the images of different modes by the neural network, carrying out feature fusion, predicting a noise region in the depth image according to the fused features, and removing the noise region to obtain a noise reduction result of the depth image, namely the optimized depth image. Therefore, the noise reduction method for the depth image can be used for identifying the noise area on the depth image through the trained neural network fusing the depth image and other image sources, so that the noise of the depth image can be reduced, and the depth image can be optimized.

In an embodiment of the present application, the neural network employed in step S120 may include a backbone network module (backbone), a converged network module, and a head network module (head). Based on this, predicting noise regions in the depth image by the neural network based on fusion of features of different modality images in step S120 may include: extracting features of the depth image and at least one other mode image by a backbone network module of the neural network; fusing the characteristics extracted by the main network module by a fusion network module of the neural network; and predicting, by a head network module of the neural network, a noise region in the depth image based on the features fused by the fusion network module.

Wherein, in one example, the backbone network module of the neural network may further include a first backbone network module and a second backbone network module, wherein: the first backbone network module is used for encoding the depth image to obtain the characteristics extracted from the depth image; the second backbone network module is used for encoding the images of other modes so as to obtain the characteristics extracted from the images of other modes; the first backbone network module and the second backbone network module share weights, i.e. the weights of the two backbone network modules remain consistent. And different backbone network modules sharing weights are adopted to respectively encode the depth image and the images of other modes, so that the encoding efficiency can be improved. In other examples, the same backbone network module may also be employed to encode the depth image and the other modality images, respectively, to derive their respective features.

In the embodiment of the application, the backbone network module inputs the two data codes to the fusion network module for fusion after finishing the encoding. When training the neural network, the fusion network module can limit that the two information sources must contain information with larger difference (similarity is smaller) so as to ensure that the two information sources can provide complementary information, and meanwhile, due to the conductivity of a cosine function, the neural network can be trained end-to-end (end-to-end).

In an embodiment of the present application, the (head network module of the) neural network uses a specific candidate anchor box (bounding box) when predicting the noise region of the depth image, wherein the specific candidate anchor box is determined based on the source of the noise region in the depth image. In this embodiment, instead of simply adopting a conventional rectangular frame as a detection candidate anchor frame, a specific candidate anchor frame is determined according to the source characteristics of the noise region in the depth image, so that it is more advantageous to accurately detect the noise region in the depth image.

In one example, the source of the noise region in the depth image includes an artificial light source (i.e., the scene of the depth image may be an indoor scene), and since the shape of the high-reflection region (noise region) generated by the artificial light source is mostly biased to be circular, elliptical or rectangular, the candidate anchor frame of the neural network may be specified as at least one of a circular candidate anchor frame, an elliptical candidate anchor frame and a rectangular candidate anchor frame by using such a priori knowledge, so that the detection accuracy and the detection precision of the noise region in the depth image are significantly improved.

The noise reduction method 100 of the depth image according to the embodiment of the present application is exemplarily described above. Based on the above description, the noise reduction method 100 for a depth image according to an embodiment of the present application can reduce noise of the depth image by fusing the depth image with a trained neural network and identifying noise areas on the depth image from other image sources, thereby optimizing the depth image.

The noise reduction device 200 for a depth image according to another aspect of the present application is described below with reference to fig. 2. Fig. 2 shows a schematic block diagram of a noise reduction device 200 for a depth image according to an embodiment of the present application. As shown in fig. 2, the noise reduction device 200 for a depth image according to an embodiment of the present application may include a memory 210 and a processor 220, where the memory 210 stores a computer program executed by the processor 220, and the computer program when executed by the processor 220 causes the processor 220 to perform the foregoing noise reduction method 100 for a depth image according to an embodiment of the present application. Those skilled in the art can understand the specific operation of the noise reduction device 200 for depth image according to the embodiment of the present application in combination with the foregoing descriptions, and for brevity, specific details are not repeated herein, only some of the main operations of the processor 220 are described.

In one embodiment of the application, the computer program, when executed by the processor 220, causes the processor 220 to perform the steps of: acquiring a multi-modal image acquired for the same scene, wherein the multi-modal image comprises a depth image and at least one other modal image; respectively inputting the depth image and at least one other mode image into a trained neural network, and predicting noise areas in the depth image by the neural network based on fusion of features of different mode images; and removing a noise region in the depth image based on a prediction result of the neural network to obtain and output an optimized depth image.

In one embodiment of the present application, the computer program, when executed by the processor 220, causes the processor 220 to execute predicting noise regions in depth images based on fusion of features of images of different modalities by a neural network, comprising: extracting features of the depth image and at least one other mode image by a backbone network module of the neural network; fusing the characteristics extracted by the main network module by a fusion network module of the neural network; and predicting, by a head network module of the neural network, a noise region in the depth image based on the features fused by the fusion network module.

In one embodiment of the application, the backbone network module comprises a first backbone network module and a second backbone network module, wherein: the first backbone network module is used for encoding the depth image; the second backbone network module is used for encoding other mode images; the first backbone network module and the second backbone network module share weights.

In one embodiment of the application, the other modality image includes at least one of an infrared image, a color image, and a gray scale image.

In one embodiment of the application, the scene comprises an indoor scene.

Based on the above description, the noise reduction device 200 for a depth image according to an embodiment of the present application can reduce noise of the depth image by fusing the depth image with a trained neural network and identifying noise areas on the depth image from other image sources, thereby optimizing the depth image.

An obstacle detection method provided according to another aspect of the present application is described below with reference to fig. 3. Fig. 3 shows a schematic flow chart of an obstacle detection method 300 according to an embodiment of the application. As shown in fig. 3, the obstacle detection method 300 may include the steps of:

in step S310, a multi-modality image acquired for the same scene is acquired, the multi-modality image comprising a depth image and at least one other modality image.

In step S320, the depth image and at least one other mode image are respectively input to a trained neural network, and noise areas in the depth image are predicted by the neural network based on fusion of features of different mode images;

in step S330, noise areas in the depth image are removed based on the prediction result of the neural network, so as to obtain an optimized depth image;

in step S340, the scene is subjected to obstacle detection based on the optimized depth image, and an obstacle detection result of the scene is obtained.

In an embodiment of the present application, a depth image and other image sources are fused to identify erroneous information (noise regions) on the depth image, and an optimized depth image (the same as the noise reduction method 100 for the depth image described above) is obtained by removing the noise regions, and then obstacle detection is performed based on the optimized depth image. Since false information (i.e., false depth values) on depth images, particularly of indoor scenes, often results from highly reflective, highlight objects, including but not limited to lights on, floor tile reflections, highly reflective columns, and the like. These erroneous depth values, in turn, tend to differ significantly from surrounding depth values (extremely far, creating holes in the depth image, or extremely near, creating peaks in the depth image). The intensity values of these regions are extremely strong on images of other modalities, such as infrared images, color images, or gray scale images. Thus, this information can be well modeled by the deep neural network, so that these regions are detected using the multi-modal deep neural network. After the detected noise areas are removed, an optimized depth image can be obtained, and obstacle detection based on the depth image can effectively reduce or even avoid the occurrence of false obstacle detection, so that the accuracy of obstacle detection is improved.

In an embodiment of the present application, the neural network employed in step S320 may include a backbone network module (backbone), a converged network module, and a head network module (head). Based on this, predicting noise regions in the depth image by the neural network based on the fusion of features of the different modality images in step S320 may include: extracting features of the depth image and at least one other mode image by a backbone network module of the neural network; fusing the characteristics extracted by the main network module by a fusion network module of the neural network; and predicting, by a head network module of the neural network, a noise region in the depth image based on the features fused by the fusion network module.

In an embodiment of the application, the (head network module of the) neural network utilizes a specific candidate anchor box when predicting the noise region of the depth image, wherein the specific candidate anchor box is determined based on the source of the noise region in the depth image. In this embodiment, instead of simply adopting a conventional rectangular frame as a detection candidate anchor frame, a specific candidate anchor frame is determined according to the source characteristics of the noise region in the depth image, so that it is more advantageous to accurately detect the noise region in the depth image.

In one example, based on the optimized depth image, the pose and size of the obstacle in the scene in three-dimensional space can be acquired in combination with the internal and external parameters of the camera capturing the depth image, thereby obtaining the obstacle detection result.

The obstacle detection method 300 according to the embodiment of the present application is exemplarily described above. Based on the above description, the obstacle detection method 300 according to the embodiment of the application identifies the noise area on the depth image by fusing the trained neural network with the depth image and other image sources, and removes the noise area on the depth image, so as to obtain an optimized depth image for obstacle detection, which can effectively reduce or even avoid the occurrence of false obstacle detection, thereby improving the accuracy of obstacle detection.

An obstacle detecting apparatus 400 according to another aspect of the present application is described below with reference to fig. 4. Fig. 4 shows a schematic block diagram of the obstacle detecting apparatus 400 according to the embodiment of the present application. As shown in fig. 4, the obstacle detecting apparatus 400 according to an embodiment of the application may include a memory 410 and a processor 420, the memory 410 storing a computer program executed by the processor 420, which when executed by the processor 420, causes the processor 420 to perform the obstacle detecting method 300 according to the embodiment of the application described above. Those skilled in the art can understand the specific operation of the obstacle detecting device 400 according to the embodiment of the application in combination with the foregoing, and for brevity, the description is omitted here.

A mobile robot 500 provided in accordance with still another aspect of the present application is described below in conjunction with fig. 5. Fig. 5 shows a schematic block diagram of a mobile robot 500 according to an embodiment of the present application. As shown in fig. 5, a mobile robot 500 according to an embodiment of the present application may include an image acquisition device 510 and an obstacle detection device 520. The image acquisition device 510 is configured to acquire an image for an area to be moved of the mobile robot 500, where the image includes a depth image and at least one other modality image. The obstacle detection device 520 is configured to perform obstacle detection based on the image acquired by the image acquisition device 510, so as to avoid an obstacle during the movement of the mobile robot 500, where the obstacle detection device 520 may be the aforementioned obstacle detection device 400. Those skilled in the art will understand that the specific operation of the obstacle detecting device 520 is not described in detail herein for brevity.

In an embodiment of the present application, the image capturing device 510 of the mobile robot 500 captures a multi-modal image for the same scene, where the captured image includes at least one other modal image, such as an infrared image, a color image, or a grayscale image, in addition to the depth image. Here, the image pickup device 510 of the mobile robot 500 may include a camera capable of providing images of multiple modalities, or may include different cameras to provide images of different modalities. The images of different modes are input into the obstacle detection device 520, the obstacle detection device 520 extracts features of the images of different modes respectively and performs feature fusion, noise areas in the depth images are predicted according to the fused features, the noise areas are removed to obtain optimized depth images, obstacle detection is performed based on the optimized depth images, and obstacle avoidance is performed according to obstacle detection results during movement.

As previously described, since false information (i.e., false depth values) on depth images, particularly of indoor scenes, is often caused by highly reflective, highlight objects, including but not limited to lights on, ground tile reflections, highly reflective columns, and the like. These erroneous depth values, in turn, tend to differ significantly from surrounding depth values (extremely far, creating holes in the depth image, or extremely near, creating peaks in the depth image). The intensity values of these regions are extremely strong on images of other modalities, such as infrared images, color images, or gray scale images. Thus, this information can be well modeled by the deep neural network, so that these regions are detected using the multi-modal deep neural network (included in the obstacle detection device 520). After the detected noise areas are removed, an optimized depth image can be obtained, the occurrence of false obstacle detection can be effectively reduced or even avoided through obstacle detection based on the depth image, and the accuracy of obstacle detection is improved, so that the accuracy of obstacle avoidance is improved.

Furthermore, according to an embodiment of the present application, there is also provided a storage medium on which program instructions are stored, which program instructions, when executed by a computer or a processor, are for performing the respective steps of the noise reduction method or the obstacle detection method of a depth image of an embodiment of the present application. The storage medium may include, for example, a memory card of a smart phone, a memory component of a tablet computer, a hard disk of a personal computer, read-only memory (ROM), erasable programmable read-only memory (EPROM), portable compact disc read-only memory (CD-ROM), USB memory, or any combination of the foregoing storage media. The computer-readable storage medium may be any combination of one or more computer-readable storage media.

Based on the above description, the noise reduction method and device for the depth image according to the embodiment of the application can be used for reducing noise of the depth image by fusing the depth image with other image sources through the trained neural network to identify the noise area on the depth image, so that the depth image is optimized. According to the obstacle detection method and device, the noise area on the depth image is identified through the trained neural network fusion depth image and other image sources, the noise area on the depth image is removed, and the optimized depth image is obtained and used for obstacle detection, so that the occurrence of the condition of false obstacle detection can be effectively reduced or even avoided, and the accuracy of obstacle detection is improved. According to the mobile robot disclosed by the embodiment of the application, the obstacle detection is performed based on the obstacle detection device, so that the occurrence of false obstacle detection can be effectively reduced or even avoided, the accuracy of obstacle detection is improved, and the accuracy of obstacle avoidance is improved.

Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the above illustrative embodiments are merely illustrative and are not intended to limit the scope of the present application thereto. Various changes and modifications may be made therein by one of ordinary skill in the art without departing from the scope and spirit of the application. All such changes and modifications are intended to be included within the scope of the present application as set forth in the appended claims.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described device embodiments are merely illustrative, e.g., the division of the elements is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple elements or components may be combined or integrated into another device, or some features may be omitted or not performed.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in order to streamline the application and aid in understanding one or more of the various inventive aspects, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof in the description of exemplary embodiments of the application. However, the method of the present application should not be construed as reflecting the following intent: i.e., the claimed application requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

It will be understood by those skilled in the art that all of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be combined in any combination, except combinations where the features are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

Various component embodiments of the application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functions of some of the modules according to embodiments of the present application may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present application can also be implemented as an apparatus program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present application may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.

The foregoing description is merely illustrative of specific embodiments of the present application and the scope of the present application is not limited thereto, and any person skilled in the art can easily think about variations or substitutions within the scope of the present application. The protection scope of the application is subject to the protection scope of the claims.

Claims

1. A method of denoising a depth image, the method comprising:

acquiring a multi-modal image acquired for the same scene, the multi-modal image comprising a depth image and at least one other modal image;

respectively inputting the depth image and the at least one other mode image into a trained neural network, and predicting a noise region in the depth image by the neural network based on fusion of features of different mode images;

and removing the noise area in the depth image based on the prediction result of the neural network to obtain and output an optimized depth image.

2. The method of claim 1, wherein the neural network utilizes a particular candidate anchor box in predicting the noise region, wherein the particular candidate anchor box is determined based on a source of the noise region in the depth image.

3. The method of claim 2, wherein when the source of the noise region in the depth image comprises an artificial light source, the particular candidate anchor frame comprises at least one of a circular candidate anchor frame, an elliptical candidate anchor frame, a rectangular candidate anchor frame.

4. A method according to any of claims 1-3, wherein predicting, by the neural network, noise regions in the depth image based on a fusion of features of different modality images, comprises:

extracting features of the depth image and the at least one other modality image by a backbone network module of the neural network;

fusing the characteristics extracted by the main network module by a fusion network module of the neural network;

and predicting a noise area in the depth image by a head network module of the neural network based on the features fused by the fusion network module.

5. The method of claim 4, wherein the backbone network module comprises a first backbone network module and a second backbone network module, wherein:

the first backbone network module is used for encoding the depth image;

the second backbone network module is used for encoding the other mode images;

the first backbone network module and the second backbone network module share weights.

6. A method according to any of claims 1-3, wherein the other modality image comprises at least one of an infrared image, a color image and a gray scale image.

7. A method according to any of claims 1-3, wherein the scene comprises an indoor scene.

8. A noise reduction device for a depth image, characterized in that the device comprises a memory and a processor, the memory having stored thereon a computer program to be run by the processor, which computer program, when run by the processor, causes the processor to perform the noise reduction method for a depth image according to any of claims 1-7.

9. A method of detecting an obstacle, the method comprising:

acquiring a depth image of a scene to be subjected to obstacle detection, wherein the depth image is an optimized depth image obtained according to the noise reduction method of the depth image of any one of claims 1 to 7;

and performing obstacle detection on the scene based on the optimized depth image to obtain an obstacle detection result of the scene.

10. An obstacle detection device, characterized in that the device comprises a memory and a processor, the memory having stored thereon a computer program to be run by the processor, which, when run by the processor, causes the processor to perform the obstacle detection method as claimed in claim 9.

11. A mobile robot comprising an image acquisition device and an obstacle detection device, wherein:

the image acquisition device is used for acquiring images aiming at a region to be moved of the mobile robot, wherein the images comprise depth images and at least one other mode image;

the obstacle detection device is used for performing obstacle detection based on the image acquired by the image acquisition device and used for avoiding obstacles in the moving process of the mobile robot, wherein the obstacle detection device comprises the obstacle detection device of claim 10.

12. A storage medium having stored thereon a computer program to be run by a processor, which computer program, when run by the processor, causes the processor to perform the method of noise reduction of a depth image according to any one of claims 1-7 or to perform the method of obstacle detection according to claim 9.