WO2021092805A1

WO2021092805A1 - Multi-modal data fusion method and apparatus, and intellignet robot

Info

Publication number: WO2021092805A1
Application number: PCT/CN2019/118102
Authority: WO
Inventors: 朱森强; 杨光雨
Original assignee: 中新智擎科技有限公司
Priority date: 2019-11-13
Filing date: 2019-11-13
Publication date: 2021-05-20

Abstract

Disclosed is a multi-modal data fusion method, relating to the technical field of electronic information. The multi-modal data fusion method is applied to an intelligent robot, and the intelligent robot is provided with a depth camera and a laser radar. The method comprises: firstly, respectively collecting a 3D depth image and a laser data set of an external environment by means of a depth camera and a laser radar (110, 120); then, reading a 3D depth data set in the 3D depth image, and converting the 3D depth data set into a 2D depth data set (130); and finally, performing fusion processing on the 2D depth data set and the laser data set (140). An intelligent robot using the method can detect an obstacle that is in front of the intelligent robot and in a vision blind area of a laser radar, thereby improving the safety performance of the intelligent robot.

Description

Multimodal data fusion method, device and intelligent robot

Technical field

The embodiments of the present invention relate to the field of electronic information technology, and in particular to a method, device and intelligent robot for multi-modal data fusion.

Background technique

With the development of artificial intelligence technology, robots are becoming more and more intelligent, and the application of intelligent robots in all walks of life is becoming more and more extensive. Among them, intelligent robots that can move usually need to have certain path planning functions. Navigate and avoid obstacles.

In the process of implementing the embodiments of the present invention, the inventor found that the above related technologies have at least the following problems: currently intelligent robots usually use a single beam of lidar for navigation and obstacle avoidance, and can only scan obstacles on the plane where the lidar is located. Intelligent robots Obstacles that are lower or higher than the lidar plane cannot be detected, and there are considerable visual blind spots.

Summary of the invention

In view of the above-mentioned defects of the prior art, the purpose of the embodiments of the present invention is to provide a multi-modal data fusion method, device and intelligent robot.

The purpose of the embodiments of the present invention is achieved through the following technical solutions:

In order to solve the above technical problems, in the first aspect, an embodiment of the present invention provides a multi-modal data fusion method, which is applied to an intelligent robot, the intelligent robot is provided with a depth camera and a lidar, and the method includes:

Collecting a 3D depth image of the external environment through the depth camera;

Collecting a laser data set of the external environment through the lidar;

Reading the 3D depth data set in the 3D depth image, and converting the 3D depth data set into a 2D depth data set;

Fusion processing is performed on the 2D depth data set and the laser data set.

In some embodiments, the laser data set includes one-dimensional coordinates of linearly arranged data points and the first distance of each of the data points, and the 3D depth data set includes pixel points distributed along the X axis and the Y axis X-axis coordinates and Y-axis coordinates of and the second distance of each pixel;

The step of reading the 3D depth data set in the 3D depth image and converting the 3D depth data set into a 2D depth data set specifically includes:

Selecting the smallest second distance among the second distances of the pixel points with the same X-axis coordinate;

The X-axis coordinate and the minimum second distance corresponding to the X-axis coordinate form the 2D depth data set.

In some embodiments, the step of reading the 3D depth data set in the 3D depth image and converting the 3D depth data set into a 2D depth data set further includes:

Acquiring the first resolution of the laser data set and the second resolution of the 2D depth data set;

Determine whether the second resolution is greater than the first resolution;

If yes, calculate the multiple of the second resolution and the first resolution;

Grouping the pixel points in the 2D depth data set according to the multiple;

In each group, the pixel corresponding to the minimum value of the minimum second distance is selected, and other pixels are deleted to obtain a corrected 2D depth data set.

In some embodiments, the step of performing fusion processing on the 2D depth data set and the laser data set specifically includes:

Acquiring the overlapping area between the acquisition area of the lidar and the acquisition area of the depth camera;

In combination with the coincidence area, the data segment in the coincidence area in the laser data set is replaced with the corrected 2D depth data set.

In some embodiments, the method further includes:

If the second resolution is equal to the first resolution, then the data segment in the overlap area in the laser data set is replaced with the 2D depth data set.

In some embodiments, the method further includes:

If the second resolution is less than the first resolution, acquiring each data point of the laser data set in the overlap area corresponds to the pixel point of the 3D depth image;

Judging whether the first distance of each data point is less than the minimum second distance of the corresponding pixel;

If yes, replace the first distance of the data point whose first distance is less than the minimum second distance with the minimum second distance of the corresponding pixel point.

In order to solve the above technical problems, in a second aspect, an embodiment of the present invention provides a multi-modal data fusion device, which is applied to an intelligent robot, the intelligent robot is provided with a depth camera and a lidar, and the device includes:

The first acquisition module is configured to acquire a 3D depth image of the external environment through the depth camera;

The second collection module is configured to collect a laser data set of the external environment through the lidar;

A conversion module, configured to read a 3D depth data set in the 3D depth image, and convert the 3D depth data set into a 2D depth data set;

The fusion module is used to perform fusion processing on the 2D depth data set and the laser data set.

In some embodiments, the laser data set includes one-dimensional coordinates of linearly arranged data points and the first distance of each data point, and the 3D depth data set includes pixel points distributed along the X axis and the Y axis X-axis coordinates and Y-axis coordinates of and the second distance of each pixel;

The conversion module is further configured to select the smallest second distance among the second distances of the pixel points with the same X-axis coordinate;

In some embodiments, the conversion module is further configured to obtain the first resolution of the laser data set, and the second resolution of the 2D depth data set;

Determine whether the second resolution is greater than the first resolution;

Grouping the pixel points in the 2D depth data set according to the multiple;

In order to solve the above technical problems, in a third aspect, an embodiment of the present invention provides an intelligent robot, including:

At least one processor; and,

A memory communicatively connected with the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the method described in the first aspect above.

In order to solve the above technical problems, in the fourth aspect, embodiments of the present invention also provide a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to make a computer execute The method described in the first aspect above.

In order to solve the above technical problems, in the fifth aspect, embodiments of the present invention also provide a computer program product. The computer program product includes a computer program stored on a computer-readable storage medium. The computer program includes program instructions. When the program instructions are executed by a computer, the computer executes the method described in the first aspect above.

Compared with the prior art, the beneficial effect of the present invention is: different from the prior art, the embodiment of the present invention provides a multi-modal data fusion method, which is applied to an intelligent robot, and the intelligent robot is provided with a depth Camera and lidar, this method first collects the 3D depth image and laser data set of the external environment through the depth camera and lidar respectively, then reads the 3D depth data set in the 3D depth image, and converts the 3D depth data set Converted into a 2D depth data set, and finally, the 2D depth data set and the laser data set are fused, and the intelligent robot using the method provided in the embodiment of the present invention can detect the position in front of the intelligent robot and in the lidar Obstacles in the visual blind zone improve the safety performance of the intelligent robot.

Description of the drawings

One or more embodiments are exemplified by the pictures in the corresponding drawings. These exemplified descriptions do not constitute a limitation on the embodiments. The components/modules and steps with the same reference numerals in the drawings represent For similar components/modules and steps, unless otherwise stated, the figures in the drawings do not constitute a limitation of scale.

FIG. 1 is a schematic diagram of the structure of an intelligent robot and the collection area of the robot applied to the multimodal data fusion method of an embodiment of the present invention;

Figure 2 is a flowchart of a multimodal data fusion method provided by an embodiment of the present invention;

FIG. 3 is an example diagram of an image in front of a smart robot collected by a depth camera and a lidar provided by an embodiment of the present invention;

FIG. 4 is a sub-flow chart of step 130 in the method shown in FIG. 2;

FIG. 5 is a sub-flow chart of step 130 and step 140 in the method shown in FIG. 2;

6 is a schematic structural diagram of a multi-modal data fusion device provided by an embodiment of the present invention;

FIG. 7 is a schematic diagram of the hardware structure of an intelligent robot that executes the above-mentioned multi-modal data fusion method provided by an embodiment of the present invention.

Detailed ways

The present invention will be described in detail below in conjunction with specific embodiments. The following examples will help those skilled in the art to further understand the present invention, but do not limit the present invention in any form. It should be pointed out that for those of ordinary skill in the art, several modifications and improvements can be made without departing from the concept of the present invention. These all belong to the protection scope of the present invention.

In order to make the purpose, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.

It should be noted that, if there is no conflict, the various features in the embodiments of the present invention can be combined with each other, and all fall within the protection scope of the present application. In addition, although the functional modules are divided in the schematic diagram of the device, and the logical sequence is shown in the flowchart, in some cases, the module division in the device may be different from the module division in the device, or the sequence shown in the flowchart may be executed. Or the steps described. In addition, the words "first", "second", and "third" used in this document do not limit the data and execution order, but only distinguish the same or similar items with basically the same function and effect.

Unless otherwise defined, all technical and scientific terms used in this specification have the same meaning as commonly understood by those skilled in the technical field of the present invention. The terms used in the specification of the present invention in this specification are only for the purpose of describing specific embodiments, and are not used to limit the present invention. The term "and/or" used in this specification includes any and all combinations of one or more related listed items.

In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.

Please refer to FIG. 1, which shows a schematic diagram of the structure of the intelligent robot and the collection area of the intelligent robot applied to the multi-modal data fusion method of the embodiment of the present invention, where the intelligent robot 10 on the left side of FIG. 1 is capable of executing In the intelligent robot of the multi-modal data fusion method according to the embodiment of the present invention, the intelligent robot 10 is provided with a depth camera 11 and a lidar 12.

The depth camera 11 is a binocular camera, which can be used to collect a depth image in front of the intelligent robot 10, and the depth image contains distance information from the intelligent robot 10 to an obstacle in front of the robot. For example, it may be a common depth camera such as a Ki nect-v1 camera, a Foton camera, a ZED camera, and so on.

The lidar 12 is a device that can obtain characteristic quantities such as the position and speed of the obstacle by emitting a laser beam to the obstacle, and usually can only detect the distance information of the obstacle on a plane of a certain height.

Normally, the field angle β of the lidar 12 is much larger than the field angle α of the depth camera 11, and the field angle β of the lidar 12 can usually reach 180 degrees or even exceed 180 degrees. When the depth camera 11 shoots the image in front of the robot, it uses a wide-angle lens, and the captured images on both sides are usually distorted. Therefore, it is preferable that the depth camera 11 in the embodiment of the present invention adopts a field angle α of less than ninety. Degree of binocular camera.

Specifically, the embodiments of the present invention will be further described below in conjunction with the accompanying drawings.

The embodiment of the present invention provides a multi-modal data fusion method, which can be executed by the above-mentioned intelligent robot 10. Please refer to FIG. 2, which shows the flow of a multi-modal data fusion method provided by the embodiment of the present invention. Figure, the method includes but is not limited to the following steps:

Step 110: Collect a 3D depth image of the external environment through the depth camera.

In the embodiment of the present invention, as shown in FIG. 1, the depth camera 11 can be used to collect a 3D depth image S2 in front of the intelligent robot 11 within the field of view α. Each pixel of the 3D depth image S2 contains Distance information on a straight line from the obstacle.

Step 120: Collect a laser data set of the external environment through the lidar.

In the embodiment of the present invention, as shown in Figure 1, the laser radar 12 can collect the laser data set S1 in front of the intelligent robot 11 within the field of view β. It can be understood that it includes the laser radar in each data collection The straight-line distance information from the obstacle in front of the point collected. The laser data set S1 can be understood as a line carrying distance information, and the line contains multiple data collection points.

Step 130: Read the 3D depth data set in the 3D depth image, and convert the 3D depth data set into a 2D depth data set.

Since the lidar collects data information on a certain height plane, and the field of view of the lidar is larger than the field of view of the depth camera, the viewing range of the depth camera in the height direction is larger than that of the lidar in the height direction. The detection range of the direction, and the detection range of the lidar in the horizontal direction is also larger than the viewing range of the depth camera in the horizontal direction. Also, there is an overlap area S in the images obtained by the depth camera and the lidar.

In the embodiment of the present invention, in order to represent the 3D depth data set in the detection direction of the laser data set S1, to further perform fusion processing with the laser data set to obtain more accurate obstacle detection information, it is also necessary The 3D depth data set is converted into a 2D depth data set in the detection direction of the laser data set S1.

Step 140: Perform fusion processing on the 2D depth data set and the laser data set.

In the embodiment of the present invention, in order to obtain the information of the closest distance between the obstacle in front of the intelligent robot and the intelligent robot, further, the 3D depth image and the laser data set are fused. The fusion processing specifically includes performing fusion processing on the 2D depth data set converted from the 3D depth data set and the laser data set in the overlapping area S.

The embodiment of the present invention provides a multi-modal data fusion method, which is applied to a robot, and the robot is provided with a depth camera and a lidar. The method first collects 3D depth images and lasers of the external environment through the depth camera and the lidar, respectively. Data set, then read the 3D depth data set in the 3D depth image, convert the 3D depth data set into a 2D depth data set, and finally, fuse the 2D depth data set and the laser data set For processing, the intelligent robot using the method provided by the embodiment of the present invention can detect obstacles in front of the intelligent robot and in the blind spot of the lidar, thereby improving the safety performance of the intelligent robot.

In some embodiments, please refer to FIG. 3, which shows an example diagram of a data set in front of the robot collected by a depth camera and a lidar provided by an embodiment of the present invention. The data set is shown in FIG. 1 above. The laser data set S1 and the 3D depth data set S2' in the 3D depth image S2, the laser data set S1 includes the one-dimensional coordinates of a linearly arranged data point and the first distance of each of the data points, The 3D depth data set S2' includes X-axis coordinates and Y-axis coordinates of pixel points distributed along the X-axis and Y-axis, and the second distance of each pixel point.

In the embodiment shown in FIG. 3, the laser data set S1 includes 9 data points distributed along the X axis, and the first distance (a1, a2,..., a9) of each of the data points, and each data point The points all carry their one-dimensional coordinates. The 3D depth data set S2' includes 36 pixels distributed along the X-axis and Y-axis and the second distance of each pixel (b11, b12, ..., b16, b21,..., b66), each pixel carries its two-dimensional coordinates. It is understandable that the system superimposes the laser data set S1 and the 3D depth data set S2' through the one-dimensional coordinates of each data point and the two-dimensional coordinates of each pixel point to obtain The superimposed images of the laser data set S1 and the 3D depth data set S2' shown in FIG. 3 are specifically calibrated according to the one-dimensional coordinates of the data points and the coordinates of the pixel points on the X axis to achieve superposition.

It should be noted that the number of data points in the laser data set S1 and the number of pixels in the 3D depth data set S2' shown in FIG. 3 are not limited to the numbers described in the foregoing embodiment, which are based on actual applications. The sampling frequency of the lidar and the resolution of the depth camera are determined.

Please also refer to FIG. 4, which shows a sub-flow chart of step 130 in the method shown in FIG. 2, where step 130 includes:

Step 131: Select the smallest second distance among the second distances of the pixel points with the same X-axis coordinate.

Step 132: Combine the X-axis coordinates and the minimum second distance corresponding to the X-axis coordinates into the 2D depth data set.

In the embodiment of the present invention, before the 2D depth data set and the laser data set are fused, the 3D depth data set needs to be converted into a 2D depth data set. Please also refer to FIG. 3, which The conversion process is specifically: taking the second distance of all pixels on the same Y axis in the 3D depth data set S2', taking the smallest value among them, and assigning them to the X corresponding to the Y axis on the coincidence area S For the pixel points on the axis coordinates, take the second distance of all the pixels on the same Y axis to take the minimum value of the second distance and assign it to the corresponding Y axis on the coincident area S on the corresponding X axis coordinate After pixel points, the 2D depth data set can be obtained.

For example, in Figure 3, the minimum value of the second distance of the first column of pixels is obtained, that is, the minimum value of the six second distance values from b11 to b61, and then the minimum value is assigned to the pixel where b31 is located Point, and so on. Finally, the obtained 2D depth data set only contains a row of pixels on the X axis corresponding to the pixel where b31 is located, and each pixel on the row of pixels is assigned a value on the corresponding Y axis of the pixel The minimum value of the second distance of all pixels, so that each pixel on the finally obtained 2D depth data set contains the information of the shortest distance between all the pixels on the Y axis where the pixel is located and the obstacle.

In some embodiments, please refer to FIG. 5, which shows a sub-flow chart of step 130 and step 140 in the method shown in FIG. 2. Based on the methods shown in FIG. 2 and FIG. 3, the method further includes:

Step 133: Obtain the first resolution of the laser data set and the second resolution of the 2D depth data set.

Step 134: Determine whether the second resolution is greater than the first resolution. If yes, skip to step 135; if not, skip to step 143.

Step 135: Calculate the multiple of the second resolution and the first resolution.

Step 136: Group the pixel points in the 2D depth data set according to the multiple.

Step 137: Select the pixel corresponding to the minimum value of the minimum second distance in each group, and delete other pixels to obtain a corrected 2D depth data set.

In the embodiment of the present invention, it is further necessary to determine the size relationship between the first resolution of the laser data set collected by the lidar and the second resolution of the 2D depth data set of the 3D depth image collected by the depth camera And the multiple relationship to obtain a corrected 2D depth data set for fusion processing on the 2D depth data set and the laser data set. It is understandable that since the fusion process is to obtain the overlap area of the 2D depth data set and the laser data set, and re-assign each data point of the laser data set in the overlap area, therefore, the first The resolution may refer to the number of data points of the laser data set in the coincidence area, and the second resolution may refer to the number of pixels of the 2D depth data set in the coincidence area.

When the second resolution is greater than the first resolution, the multiple relationship is obtained by comparing the number of data points and the number of pixels in the overlap area, and the pixels in the 2D depth data set are calculated according to the multiple relationship. The points are grouped, and the minimum value of the minimum second distance of the pixel points in each group is taken, and the set of these minimum values is the corrected 2D depth data set.

For example, please refer to Fig. 3 together. In the image shown in Fig. 3, the number of pixels in the overlap area S of the 2D depth data set S2' in the overlap area is 6 (the pixel where b31 is located to the pixel where b36 is Point, there are a total of 6 pixels), so the second resolution is 6, and the first resolution of the laser data set S1 in the overlap area S is 3 (data point a4 to data point a6, There are 3 data points in total). Therefore, it can be calculated that the multiple of the second resolution and the first resolution is 2.

Further, grouping is performed according to the multiple, and each group contains multiple pixels of the 2D depth data set. And in each group, the pixel with the smallest value in the smallest second distance is taken, and other pixels are deleted to obtain the corrected 2D depth data set. That is, in the example shown in Figure 3, divide every two pixels in the transition image into a group, and take the minimum value in each group to obtain a corrected 2D depth data set. The corrected 2D depth The data set contains three data points and the values of the three data points are c4, c5, and c6.

Further, please continue to refer to FIG. 5, the method further includes:

Step 141: Obtain the overlapping area of the acquisition area of the lidar and the acquisition area of the depth camera.

Step 142: In combination with the overlapping area, replace the data segment in the overlapping area in the laser data set with the corrected 2D depth data set.

Finally, the data segment in the overlap area in the laser data set is replaced with the corrected 2D depth data set, so as to obtain the fused image S3 as shown in FIG. 3. In the embodiment of the present invention, the step of correcting the 2D depth data set aims to make the corrected 2D depth data set and the laser data set have the same number of pixels/data points in the image in the overlap area , So that the corrected 2D depth data set can be used to replace the data segments in the overlap area in the laser data set to obtain a fusion image. The resolution of the corrected 2D depth data set is not in the laser data set. The resolutions of the data segments in the overlap area are the same.

In some embodiments, there may also be cases where the multiple of the second resolution and the first resolution is not an integer. In this case, after rounding the multiple and adding one, the overlap area with the data point of the laser data set is taken. The pixel points of the 2D depth data set are rounded around the upper position and a multiple of the pixel points are taken to obtain the minimum value to obtain the corrected 2D depth data set. For example, when the multiple is 2.4, round and add one to 3, and take the minimum value on the first, second, and third pixels of the 2D depth data set image as the first data point of the corrected 2D depth data set , Take the minimum value on the 3rd, 4th, and 5th pixel points of the 2D depth data set as the second data point of the corrected 2D depth data set, and so on to obtain the corrected 2D depth data set, and The number of data points in the corrected 2D depth data set is consistent with the number of data points in the overlap area of the laser data set.

In some embodiments, please continue to refer to FIG. 5, and the method further includes:

Step 143: Determine whether the second resolution is equal to the first resolution. If yes, skip to step 144; if not, skip to step 145.

Step 144: Replace the data segment in the overlapping area in the laser data set with the 2D depth data set.

In the embodiment of the present invention, there may also be a situation where the second resolution and the first resolution are equal, that is, the number of data points in the overlap area of the laser data set is the same as the number of pixels in the 2D depth data set At this time, the data segment in the overlap area in the laser data set can be directly replaced with the 2D depth data set to obtain the fused image.

Step 145: Obtain the pixel point of the 3D depth image corresponding to each data point of the laser data set in the overlap area.

Step 146: Determine whether the first distance of each data point is greater than the minimum second distance of the corresponding pixel point; if yes, skip to step 147; if not, keep the 2D depth data set.

Step 147: Replace the first distance of the data point whose first distance is less than the minimum second distance with the minimum second distance of the corresponding pixel point.

In the embodiment of the present invention, there may also be a situation where the number of data points in the overlap area of the laser data set is greater than the number of pixels in the overlap area of the 2D depth data set. In this case, it is necessary to determine each data point. Is the first distance of the data point greater than the minimum second distance contained in the pixel at the location of the data point, if there is a data point greater than the relationship, replace the first distance of the data point with the pixel at the location of the data point The minimum second distance, if there is no data point greater than the relationship, the first distance carried by the data points of the laser data set is retained, so that the distance value of each data point on the final fusion image can represent the robot distance The shortest distance of the obstacle ahead.

Further, after obtaining the distance information of each data point of the intelligent robot from the obstacle in front of the fused image, the minimum value of the distance information of all data points on the fused image can be taken to obtain the distance of the intelligent robot The shortest distance between the obstacle in front and the distance between the obstacle and the intelligent robot is the shortest. Then, the forward path of the robot is planned to avoid collision between the robot and the obstacle, and to ensure the safety of the robot and the obstacle.

The embodiment of the present invention also provides a multi-modal data fusion device. Please refer to FIG. 6, which shows a schematic structural diagram of a multi-modal data fusion device provided by an embodiment of the present invention. The device 200 is applied to an intelligent robot, and the intelligent robot is provided with a depth camera and a lidar. The device 200 includes: a first acquisition module 210, a second acquisition module 220, a conversion module 230, and a fusion module 240. among them,

The first collection module 210 is configured to collect a laser data set of the external environment through the lidar;

The second collection module 220 is used to collect a laser data set of the external environment through the lidar;

The conversion module 230 is configured to read a 3D depth data set in the 3D depth image, and convert the 3D depth data set into a 2D depth data set;

The fusion module 240 is configured to perform fusion processing on the 2D depth data set and the laser data set.

The conversion module 230 is further configured to select the smallest second distance among the second distances of the pixel points with the same X-axis coordinate;

In some embodiments, the conversion module 230 is further configured to obtain the first resolution of the laser data set and the second resolution of the 2D depth data set;

Determine whether the second resolution is greater than the first resolution;

Grouping the pixel points in the 2D depth data set according to the multiple;

In some embodiments, the fusion module 240 is also used to obtain the overlapping area of the acquisition area of the lidar and the acquisition area of the depth camera;

In some embodiments, the fusion module 240 is further configured to, if the second resolution is equal to the first resolution, replace the data segment in the coincident area in the laser data set with the 2D depth data set.

In some embodiments, the fusion module 240 is further configured to obtain the 3D corresponding to each data point of the laser data set in the coincidence area if the second resolution is smaller than the first resolution. The pixels of the depth image;

The embodiment of the present invention also provides an intelligent robot. Please refer to FIG. 7, which shows the hardware structure of the intelligent robot capable of executing the multi-modal data fusion method described in FIGS. 2 to 5. The intelligent robot 10 may be the intelligent robot 10 shown in FIG. 1.

The intelligent robot 10 includes: at least one processor 11; and a memory 12 communicatively connected with the at least one processor 11, and one processor 11 is taken as an example in FIG. 7. The memory 12 stores instructions that can be executed by the at least one processor 11, and the instructions are executed by the at least one processor 11, so that the at least one processor 11 can execute the instructions shown in FIGS. 2 to 5 above. The described multi-modal data fusion method. The processor 11 and the memory 12 may be connected by a bus or in other ways. In FIG. 7, the connection by a bus is taken as an example.

As a non-volatile computer-readable storage medium, the memory 12 can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as the corresponding multi-modal data fusion method in the embodiment of the present application. The program instructions/modules, for example, the modules shown in Fig. 6. The processor 11 executes various functional applications and data processing of the server by running non-volatile software programs, instructions, and modules stored in the memory 12, that is, realizing the multi-modal data fusion method of the foregoing method embodiment.

The memory 12 may include a storage program area and a storage data area. The storage program area can store an operating system and an application program required by at least one function; the storage data area can store data created according to the use of the multi-modal data fusion device, etc. . In addition, the memory 12 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices. In some embodiments, the memory 12 may optionally include memories remotely provided with respect to the processor 11, and these remote memories may be connected to the multi-modal data fusion device via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

The one or more modules are stored in the memory 12, and when executed by the one or more processors 11, the multi-modal data fusion method in any of the foregoing method embodiments is executed, for example, the foregoing described The method steps in Fig. 2 to Fig. 5 realize the functions of each module and each unit in Fig. 6.

The above-mentioned products can execute the methods provided in the embodiments of the present application, and have functional modules and beneficial effects corresponding to the execution methods. For technical details that are not described in detail in this embodiment, please refer to the method provided in the embodiment of this application.

The embodiment of the present application also provides a non-volatile computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are executed by one or more processors, for example, The above described method steps in FIG. 2 to FIG. 5 realize the functions of each module in FIG. 6.

The embodiments of the present application also provide a computer program product, including a calculation program stored on a non-volatile computer-readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, cause all The computer executes the multi-modal data fusion method in any of the foregoing method embodiments, for example, executes the method steps in FIGS. 2 to 5 described above to realize the functions of the modules in FIG. 6.

The embodiment of the present invention provides a multi-modal data fusion method, which is applied to an intelligent robot, and the intelligent robot is provided with a depth camera and a lidar. The method first collects 3D depth images of the external environment through the depth camera and the lidar respectively And the laser data set, then read the 3D depth data set in the 3D depth image, convert the 3D depth data set into a 2D depth data set, and finally, compare the 2D depth data set and the laser data set For fusion processing, the intelligent robot using the method provided by the embodiment of the present invention can detect obstacles in front of the intelligent robot and in the blind spot of the lidar, thereby improving the safety performance of the intelligent robot.

It should be noted that the device embodiments described above are merely illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate. Units can be located in one place or distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

Through the description of the above implementation manners, those of ordinary skill in the art can clearly understand that each implementation manner can be implemented by means of software plus a general hardware platform, and of course, it can also be implemented by hardware. A person of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by a computer program instructing relevant hardware. The program can be stored in a computer readable storage medium, and the program can be stored in a computer readable storage medium. When executed, it may include the procedures of the above-mentioned method embodiments. Wherein, the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, not to limit them; under the idea of the present invention, the technical features of the above embodiments or different embodiments can also be combined. The steps can be implemented in any order, and there are many other variations of the different aspects of the present invention as described above. For the sake of brevity, they are not provided in the details; although the present invention has been described in detail with reference to the foregoing embodiments, it is common in the art The skilled person should understand that: they can still modify the technical solutions recorded in the foregoing embodiments, or equivalently replace some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the implementations of the present invention. Examples of the scope of technical solutions.

Claims

A multi-modal data fusion method, characterized in that it is applied to an intelligent robot provided with a depth camera and a lidar, and the method includes:

Collecting a 3D depth image of the external environment through the depth camera;

Collecting a laser data set of the external environment through the lidar;

Reading the 3D depth data set in the 3D depth image, and converting the 3D depth data set into a 2D depth data set;

Fusion processing is performed on the 2D depth data set and the laser data set.
The method of claim 1, wherein:

The laser data set includes one-dimensional coordinates of linearly arranged data points and the first distance of each data point, and the 3D depth data set includes X-axis coordinates and Y-axis coordinates of pixel points distributed along the X-axis and Y-axis. Axis coordinates and the second distance of each pixel;

The step of reading the 3D depth data set in the 3D depth image and converting the 3D depth data set into a 2D depth data set specifically includes:

Selecting the smallest second distance among the second distances of the pixel points with the same X-axis coordinate;

The X-axis coordinate and the minimum second distance corresponding to the X-axis coordinate form the 2D depth data set.
The method according to claim 2, wherein the step of reading a 3D depth data set in the 3D depth image and converting the 3D depth data set into a 2D depth data set further comprises:

Acquiring the first resolution of the laser data set and the second resolution of the 2D depth data set;

Determine whether the second resolution is greater than the first resolution;

If yes, calculate the multiple of the second resolution and the first resolution;

Grouping the pixel points in the 2D depth data set according to the multiple;

In each group, the pixel corresponding to the minimum value of the minimum second distance is selected, and other pixels are deleted to obtain a corrected 2D depth data set.
The method according to claim 3, wherein the step of performing fusion processing on the 2D depth data set and the laser data set specifically comprises:

Acquiring the overlapping area between the acquisition area of the lidar and the acquisition area of the depth camera;

In combination with the coincidence area, the data segment in the coincidence area in the laser data set is replaced with the corrected 2D depth data set.
The method according to claim 4, wherein the method further comprises:

If the second resolution is equal to the first resolution, then the data segment in the overlap area in the laser data set is replaced with the 2D depth data set.
The method according to claim 4, wherein the method further comprises:

If the second resolution is less than the first resolution, acquiring each data point of the laser data set in the overlap area corresponds to the pixel point of the 3D depth image;

Judging whether the first distance of each data point is greater than the minimum second distance of the corresponding pixel;

If yes, replace the first distance of the data point whose first distance is less than the minimum second distance with the minimum second distance of the corresponding pixel point.
A multi-modal data fusion device, characterized in that it is applied to an intelligent robot, the intelligent robot is provided with a depth camera and a lidar, and the device includes:

The first acquisition module is configured to acquire a 3D depth image of the external environment through the depth camera;

The second collection module is configured to collect a laser data set of the external environment through the lidar;

A conversion module, configured to read a 3D depth data set in the 3D depth image, and convert the 3D depth data set into a 2D depth data set;

The fusion module is used to perform fusion processing on the 2D depth data set and the laser data set.
The device according to claim 7, wherein:

The laser data set includes one-dimensional coordinates of linearly arranged data points and the first distance of each data point, and the 3D depth data set includes X-axis coordinates and Y-axis coordinates of pixel points distributed along the X-axis and Y-axis. Axis coordinates and the second distance of each pixel;

The conversion module is further configured to select the smallest second distance among the second distances of the pixel points with the same X-axis coordinate;

The X-axis coordinate and the minimum second distance corresponding to the X-axis coordinate form the 2D depth data set.
The device according to claim 8, wherein:

The conversion module is further configured to obtain the first resolution of the laser data set and the second resolution of the 2D depth data set;

Determine whether the second resolution is greater than the first resolution;

If yes, calculate the multiple of the second resolution and the first resolution;

Grouping the pixel points in the 2D depth data set according to the multiple;

In each group, the pixel corresponding to the minimum value of the minimum second distance is selected, and other pixels are deleted to obtain a corrected 2D depth data set.
An intelligent robot, characterized in that it includes:

At least one processor; and,

A memory communicatively connected with the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute any one of claims 1 to 6 Methods.
A computer program product, characterized in that the computer program product includes a computer program stored on a computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer Perform the method according to any one of claims 1-6.