CN115908578A

CN115908578A - Method, device and system for fusing multi-sensor data

Info

Publication number: CN115908578A
Application number: CN202211246066.XA
Authority: CN
Inventors: 韩夏冰
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Hangzhou Network Co Ltd
Priority date: 2022-10-12
Filing date: 2022-10-12
Publication date: 2023-04-04

Abstract

The application discloses a method, a device and a system for multi-sensor data fusion. The method comprises the following steps: acquiring three-dimensional point cloud data of a target scene acquired by a first sensor; acquiring two-dimensional image data of the target scene acquired by a second sensor, wherein the two-dimensional image data and the three-dimensional point cloud data are data of the target scene at a target moment; and performing data fusion on the three-dimensional point cloud data and the two-dimensional image data according to a mapping relation to obtain a three-dimensional reconstruction result of the target scene, wherein the mapping relation is the mapping relation from the first sensor coordinate system to the second sensor coordinate system, and the mapping relation is determined according to coordinates of a plurality of corner points of a plurality of calibration plates in the second sensor coordinate system and coordinates of a plurality of corner points of the plurality of calibration plates in the first sensor coordinate system. The method is easy to implement, and the precision of the data fusion result can be improved.

Description

Method, device and system for fusing multi-sensor data

Technical Field

The present application relates to the field of data fusion, and in particular, to a method, an apparatus, and a system for multi-sensor data fusion.

Background

Lidar and cameras are the most commonly used sensors for environmental perception. The laser radar can quickly and accurately acquire three-dimensional point cloud data in a target scene, each point in the three-dimensional point cloud data contains three-dimensional coordinate information in the target scene where the point is located, and the point lacks color information of the target scene where the point is located. The camera is capable of acquiring color information of the target scene. And performing data fusion on the three-dimensional point cloud data of the target scene acquired by the laser radar and the two-dimensional image data of the target scene acquired by the camera to obtain a three-dimensional reconstruction result of the target scene. However, in the conventional technology, when data fusion is performed on three-dimensional point cloud data acquired by a laser radar and two-dimensional image data acquired by a camera, complicated equipment is required, the implementation is complicated, and the precision of a data fusion result is not high.

Therefore, a method for fusing multi-sensor data is needed, which is easy to implement and can improve the precision of the data fusion result.

Disclosure of Invention

The application provides a method, a device and a system for multi-sensor data fusion, the method is easy to realize, and the precision of a data fusion result can be improved.

A first aspect of an embodiment of the present application provides a method for multi-sensor data fusion, where the method includes: acquiring three-dimensional point cloud data of a target scene acquired by a first sensor; acquiring two-dimensional image data of the target scene acquired by a second sensor, wherein the two-dimensional image data and the three-dimensional point cloud data are data of the target scene at a target moment; and performing data fusion on the three-dimensional point cloud data and the two-dimensional image data according to a mapping relation to obtain a three-dimensional reconstruction result of the target scene, wherein the mapping relation is the mapping relation from the first sensor coordinate system to the second sensor coordinate system, and the mapping relation is determined according to coordinates of a plurality of corner points of a plurality of calibration plates in the second sensor coordinate system and the coordinates of a plurality of corner points of the plurality of calibration plates in the first sensor coordinate system.

A second aspect of the embodiments of the present application provides a device for multi-sensor data fusion, where the device includes: the receiving and sending unit is used for acquiring three-dimensional point cloud data of a target scene acquired by the first sensor; the transceiving unit is further configured to acquire two-dimensional image data of the target scene acquired by a second sensor, where the two-dimensional image data and the three-dimensional point cloud data are data of the target scene at a target time; the fusion unit is used for: and performing data fusion on the three-dimensional point cloud data and the two-dimensional image data according to a mapping relation to obtain a three-dimensional reconstruction result of the target scene, wherein the mapping relation is the mapping relation from the first sensor coordinate system to the second sensor coordinate system, and the mapping relation is determined according to coordinates of a plurality of corner points of a plurality of calibration plates in the second sensor coordinate system and the coordinates of a plurality of corner points of the plurality of calibration plates in the first sensor coordinate system.

The third aspect of the embodiments of the present application further provides a device for multi-sensor data fusion, including: a processor; and the memory is used for storing a data processing program, and after the multi-sensor data fusion device is powered on and runs the program through the processor, the multi-sensor data fusion method is executed.

A fourth aspect of the embodiments of the present application further provides a system for multi-sensor data fusion, including: the system comprises a computing device, an acquisition entity and a client, wherein the computing device is used for executing the multi-sensor data fusion method to obtain a three-dimensional reconstruction result of the target scene; the first sensor arranged on the acquisition entity is used for acquiring three-dimensional point cloud data of the target scene, and the second sensor arranged on the acquisition entity is used for acquiring two-dimensional image data of the target scene; the client is used for acquiring a three-dimensional reconstruction result of the target scene from the computing device; responding to the selection of the user on the display interface of the client, and displaying the three-dimensional reconstruction result of the target scene corresponding to the selection of the user on the display interface of the client. It is understood that the selection of the user on the display interface of the client is used for performing a modification operation on the three-dimensional reconstruction result of the target scene acquired by the client from the computing device, and the modification operation may be at least one of the following operations: zoom in, zoom out, or rotate. For example, when the selection of the user on the display interface of the client is used to perform an enlarging operation on the three-dimensional reconstruction result of the target scene acquired by the client from the computing device, the display interface of the client displays the enlarged three-dimensional reconstruction result of the target scene.

The fifth aspect of the embodiments of the present application further provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the computer-executable instructions are executed by a processor, the computer-readable storage medium is configured to implement the method for multi-sensor data fusion according to any one of the above technical solutions. These computer-readable storage media include, but are not limited to, one or more of the following: read-only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), flash memory, electrically EPROM (EEPROM), and hard drive (hard drive).

A sixth aspect of the embodiments of the present application further provides a chip system, where the chip system includes a processor and a data interface, where the processor reads an instruction stored in a memory through the data interface, so as to execute the method for multi-sensor data fusion according to any one of the above technical solutions. In a specific implementation process, the chip system may be implemented in the form of a Central Processing Unit (CPU), a Micro Controller Unit (MCU), a microprocessor unit (MPU), a Digital Signal Processor (DSP), a system on chip (SoC), an application-specific integrated circuit (ASIC), a Field Programmable Gate Array (FPGA), or a Programmable Logic Device (PLD).

A seventh aspect of the embodiments of the present application further provides a computer program product, including a computer program, where the computer program is used, when being executed by a processor, to implement the method for multi-sensor data fusion according to any one of the above-mentioned technical solutions.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments disclosed herein, nor do they necessarily limit the scope of the present disclosure. Other features disclosed in the present application will become apparent from the following description. The present application may further combine to provide more implementation manners on the basis of the implementation manners provided by the above aspects.

The technical scheme of the multi-sensor data fusion method provided by the embodiment of the application comprises the following steps: acquiring three-dimensional point cloud data of a target scene acquired by a first sensor; acquiring two-dimensional image data of a target scene acquired by a second sensor, wherein the two-dimensional image data and the three-dimensional point cloud data are data of the target scene at a target moment; and performing data fusion on the three-dimensional point cloud data and the two-dimensional image data according to a mapping relation to obtain a three-dimensional reconstruction result of the target scene, wherein the mapping relation is the mapping relation from a first sensor coordinate system to a second sensor coordinate system, and the mapping relation is determined according to coordinates of a plurality of corner points of a plurality of calibration plates in the second sensor coordinate system and coordinates of a plurality of corner points of the plurality of calibration plates in the first sensor coordinate system. In the method for realizing the multi-sensor data fusion, the mapping relation used for data fusion is determined directly according to the coordinates of the multiple angular points of the multiple calibration plates in the second sensor coordinate system and the coordinates of the multiple angular points of the multiple calibration plates in the first sensor coordinate system, so that the problems that complex equipment is required to be relied on, the realization is complex and the calculation precision is low when the mapping relation is calculated in the traditional technology are solved. In conclusion, the method is easy to implement and can improve the precision of the data fusion result.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the description of the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings may be obtained according to these drawings without inventive labor.

Fig. 1 is a schematic diagram of an application scenario applicable to the multi-sensor data fusion method provided in the embodiment of the present application.

Fig. 2 is a schematic diagram of a display interface of the client 101 according to an embodiment of the present disclosure.

Fig. 3 is a schematic diagram of a method for multi-sensor data fusion provided in an embodiment of the present application.

Fig. 4 is a schematic diagram of another multi-sensor data fusion method provided in the embodiments of the present application.

Fig. 5A is a schematic diagram of a distance between a laser radar and a camera mounted on the top of a cab of an excavator according to an embodiment of the present application.

Fig. 5B is a schematic diagram of a black and white checkerboard calibration board according to an embodiment of the present application.

Fig. 5C is a schematic diagram of a two-dimensional code calibration plate according to an embodiment of the present application.

Fig. 6 is a schematic diagram of a two-dimensional code and a calibration board provided in an embodiment of the present application.

Fig. 7 is a schematic structural diagram of a multi-sensor data fusion apparatus according to an embodiment of the present application.

Fig. 8 is a schematic structural diagram of a device for multi-sensor data fusion according to an embodiment of the present application.

Fig. 9 is a schematic structural diagram of a system for multi-sensor data fusion according to an embodiment of the present application.

Detailed Description

In order to enable those skilled in the art to better understand the technical solution of the present application, the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. This application is capable of embodiments in many different forms than those described above and it is therefore intended that all such other embodiments, which would be within the scope of the present application and which are obtained by a person of ordinary skill in the art based on the embodiments provided herein without the exercise of inventive faculty, be covered by the present application.

It should be noted that the terms "first," "second," "third," and the like in the claims, the description, and the drawings of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. The data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than described or illustrated herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In order to better understand the method for fusing multi-sensor data provided in the embodiments of the present application, first, technical terms referred to in the embodiments of the present application will be described.

1, laser radar (laser radar)

The laser radar is a radar system that detects a characteristic amount such as a position and a velocity of a target by emitting a laser beam. The working principle is that a detection signal (laser beam) is emitted to a target, then a received signal (target echo) reflected from the target is compared with the emitted signal, and after appropriate processing, relevant information of the target, such as target distance, azimuth, height, speed, attitude, even shape and other parameters, can be obtained, so that the targets of airplanes, missiles and the like are detected, tracked and identified. The laser changes the electric pulse into optical pulse and emits it, and the optical receiver restores the reflected optical pulse from the target into electric pulse and sends it to the display.

In practical application, the laser radar can be used for sensing and detecting a target scene to obtain three-dimensional point cloud data of the target scene, wherein the three-dimensional point cloud data does not include color information.

2, camera (camera)

The camera is also called a normal camera. The camera is used for shooting the object to obtain a two-dimensional color image, and the two-dimensional color image comprises color information. For example, the color information may be optical three primary colors RGB color information, where R denotes Red (Red), G denotes Green (Green), and B denotes Blue (Blue).

World coordinate System (word coordinate system)

A user-defined three-dimensional world coordinate system, introduced to describe the position of an object in the real world, having coordinates (Xw, yw, zw) in units of: rice (metre, m).

4, camera coordinate System (camera coordinate system)

The coordinate system established on the camera, defined for describing the object position from the camera's point of view, as the intermediate ring communicating the world coordinate system and the image/pixel coordinate system, coordinates (Xc, yc, zc) in units of: and (m) rice.

Image coordinate system (image coordinate system)

The method is introduced for describing the projection transmission relation of an object from a camera coordinate system to an image coordinate system in the imaging process, and is convenient to further obtain coordinates in a pixel coordinate system, wherein the coordinates (x, y) are as follows: and (m) rice.

6 pixel coordinate system (pixel coordinate system)

The introduction for describing the coordinates of the image point after the object is imaged on the digital image (i.e. the photograph) is the coordinate system in which the information we really read from the camera is located, the coordinates (u, v) are in the unit: pixels (number of pixels).

Topic communication mode (topic)

the topoic communication mode is one of the distributed communication model Robot Operating System (ROS) communication modes. For real-time, periodic messages, the use of topic for transmission is the best option. topoic is a point-to-point one-way communication mode, where a "point" refers to a node (node), that is, information can be transferred between nodes by means of topoic. topic undergoes the following initialization process steps: firstly, a publisher (publisher) node and a subscriber (subscriber) node both need to register with a node manager, then the publisher (publisher) will publish topic, and the subscriber will subscribe to the topic under the direction of a controller (master), so as to establish communication between the publisher and the subscriber, and the whole process is unidirectional.

8, pixel point

The image is divided into a plurality of small checks, each small check is called a pixel point, and a grid formed by arranging the pixel points is called a raster. The computer represents the whole image by representing the information of the positions, colors, brightness and the like of the pixel points. That is, the pixel value of a pixel may include a color value. In some implementations, the pixel value of a pixel point may be a red, green, and blue (RGB) color value, and the pixel value may be a long integer representing a color. For example, the pixel value is 256 Red +100 Green +76blue, where Blue represents the Blue component, green represents the Green component, and Red represents the Red component. In each color component, the smaller the value, the lower the brightness, and the larger the value, the higher the brightness. For a grayscale image, the pixel values may be grayscale values.

9, point cloud data (point cloud data)

The point cloud data is a data set, and each point in the data set contains three-dimensional coordinate information of a space.

10, internal parameters of the camera

The internal parameters of the camera are also called the internal parameters of the camera. The internal parameters of the camera include: focal length (focal length), principal point (principal point), and distortion coefficient.

The focal length is a distance from a projection center (optical center) to a physical imaging plane, and may be referred to as f = (f) _x ,f _y ). The principal point is the corner point of the principal optical axis on the physical imaging plane, and can be written as (c) _x ,c _y ). The distortion coefficient refers to the distortion caused by the irregular shape of the lens and/or the distortion caused by the lens not being perfectly parallel to the imaging plane and may be denoted as gamma.

Hereinafter, an application scenario of the multi-sensor data fusion method and the multi-sensor data fusion method applied to the embodiment of the present application will be described in detail with reference to the drawings. It is to be understood that the embodiments and features of the embodiments described below may be combined with each other without conflict between the embodiments provided in the present application. In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.

First, an application environment applicable to the method for multi-sensor data fusion provided by the embodiments of the present application is described with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of an application environment 100 suitable for a method of multi-sensor data fusion provided by an embodiment of the present application. By way of example, fig. 1 illustrates an application environment 100 that includes the following: user a, client 101, network 102, computing device 103, collection cart 105, and target scene 104. Wherein, a laser radar 106 and a camera 107 are fixed on the top of the collection vehicle 105.

The acquisition vehicle 105 is also referred to as an acquisition entity, i.e., the acquisition vehicle 105 is a variety of entities that integrate or carry equipment with point cloud data acquisition capabilities and two-dimensional image acquisition capabilities. For example, the collection vehicle 105 shown in fig. 1 may be an excavator, and a laser radar 106 and a camera 107 are respectively disposed on the top of a cabin of the excavator. The laser radar 106 is configured to acquire three-dimensional point cloud data of the target scene 104 according to a certain acquisition frequency. The camera 107 is used to acquire two-dimensional image data of the target scene 104 at a certain frequency. Optionally, other devices besides the laser radar 106 and the camera 107 may be disposed on the collection vehicle 105, for example, the other devices may be, but are not limited to, a Global Positioning System (GPS) signal receiving device, and the GPS signal receiving device is used for receiving a GPS satellite signal and determining a spatial position of the collection vehicle 105. The collection vehicle 105 may transmit the collected three-dimensional point cloud data and two-dimensional image data to the computing device 103 via any data transmission form, such as wired network transmission data or wireless network transmission data. Such transmission need not be real-time transmission. The subsequent processing of the point cloud data and the two-dimensional image data by the computing device 103 is not affected whenever or in what form the point cloud data and the two-dimensional image data are transmitted to the computing device 103.

And the computing device 103 is used for fusing the three-dimensional point cloud data and the two-dimensional image data acquired from the acquisition vehicle 105, rendering the fused data, and obtaining a three-dimensional reconstruction result of the target scene 104. Illustratively, the computing device 103 shown in fig. 1 is disposed on the collection vehicle 105, and the collection vehicle 105 and the computing device 103 may perform data transmission via a network cable.

The type of computing device 103 is not particularly limited. For example, computing device 103 may be, but is not limited to, any of the following: a mini host, a server (e.g., a cloud server), or a processor. The processor may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), artificial Intelligence (AI) chips, system on chip (SoC) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or any conventional processor or the like. Optionally, two or more different types of processors are included in the processor. For example, the processor includes a Central Processing Unit (CPU) and at least one of a general purpose processor, a DSP, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), an Artificial Intelligence (AI) chip, a system on a chip (SoC) or other programmable logic device, discrete gate or transistor logic, and discrete hardware components.

The client 101 is installed and running with an application. The client 101 receives the rendering result obtained from the computing device 103 via the network 102, and displays the rendering result obtained from the computing device 103 using the display interface of the locally installed application 3. Illustratively, fig. 2 shows a display interface of the client 101, which includes 3 applications. In response to the user clicking on the application 3, the display interface of the client 101 jumps to the display interface of the application 3. Wherein the display interface of the application 3 includes a plurality of buttons for displaying the rendering result. In response to the user clicking on the reduced rendering result, the display interface of the application 3 displays the reduced rendering result. The type of the client 101 is not particularly limited. For example, the client 101 may be, but is not limited to, any of the following devices: a smart phone, a tablet, a Personal Digital Assistant (PDA), a wearable device, a smart screen device, a self-service terminal device, a workstation computer, or a personal computer.

The user a can accurately determine the field environment of the target scene 104 according to the rendering result displayed on the interface of the client 101, and control the collection vehicle 105 to perform work (for example, digging a field, etc.) through a client instruction. In the case shown in fig. 2, the user may click an icon of the application 3 presented in a User Interface (UI) provided by the client 101, so that the display interface of the application 3 displays the rendering result for the user to view.

The target scene 104 is not particularly limited, and the specific type of the target scene 104 may be determined according to actual requirements. For example, the target scene 104 may be an outdoor construction site, a highway, a tunnel, an overpass, or inside a building (e.g., an underground garage), among others.

Optionally, the wireless network or wired network uses standard communication techniques and/or protocols. The network is typically the internet, but can be any network including, but not limited to, a Local Area Network (LAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a mobile, wired or wireless network, a private network, or any combination of virtual private networks. In some embodiments, data exchanged over a network is represented using techniques and/or formats including hypertext markup language (HTML), extensible markup language (XML), and the like. All or some of the links can also be encrypted using conventional encryption techniques such as Secure Socket Layer (SSL), transport Layer Security (TLS), virtual Private Network (VPN), internet protocol security (IPsec), and so on. In other embodiments, custom and/or dedicated data communication techniques can also be used in place of or in addition to the data communication techniques described above.

In the application scenario 100 illustrated in fig. 1 described above, the collection vehicle 105 is illustrated as an excavator, 106 is considered as a lidar, and 107 is considered as a camera. It is to be understood that this is for illustrative purposes only and is not intended to be limiting in any way. For example, the collection vehicle 105 may also be other devices (e.g., a smart vehicle, etc.) than an excavator. For example, 106 may also be a device other than a lidar capable of acquiring three-dimensional point cloud data of the target scene 104. For example, 107 may also be a device other than a camera capable of acquiring two-dimensional image data of the target scene 104.

It should be understood that the application environment 100 shown in fig. 1 is only an illustration and does not constitute any limitation to the application environment to which the method for multi-sensor data fusion provided by the embodiment of the present application is applicable. For example, a greater number of acquisition vehicles 105 and a greater number of clients 101 may also be included in the application environment 100. For another example, the computing device 103 shown in the application environment 100 may have a function of fusing data and a function of rendering a fused result, which are executed by two devices respectively.

In the application scenario 100 shown in fig. 1, if the target scenario 104 is a poor environment (e.g., a highly dusty environment) or a complex terrain scenario, for example, the target scenario 104 is in a mine or in the field. Working in such an environment is often accompanied by strong vibration, noise and dust soil blocks, and the working condition can bring certain personal safety hidden dangers to workers and threaten the life safety of the workers. With the development of the 5G technology, the wireless remote control technology is revolutionarily changing, and more technical fields utilize a multi-sensor (for example, a laser radar and a camera) fusion technology to perform three-dimensional reconstruction on a working environment, and transmit a reconstructed result to a worker using a client through the 5G technology, so that the worker can see the working environment through eyes on site as if the worker sees the working environment through eyes, thereby realizing visual monitoring on an excavator and ensuring the personal safety of the worker.

Lidar and cameras are the most commonly used sensors for environmental perception. The laser radar can quickly and accurately acquire three-dimensional point cloud data in a target scene, each point in the three-dimensional point cloud data contains three-dimensional coordinate information of the target scene where the point is located, and the point lacks color information of the target scene where the point is located. The camera is able to acquire rich information (e.g., color information) of the target scene, but lacks depth information of geographic location in the rich information. And performing data fusion on the three-dimensional point cloud data set of the target scene acquired by the laser radar and the two-dimensional image data of the target scene acquired by the camera to obtain a three-dimensional reconstruction result of the target scene. In the conventional technology, when data fusion is performed on three-dimensional point cloud data acquired by a laser radar and two-dimensional image data acquired by a camera, the method depends on complex equipment, is not easy to implement, and has the problem of low precision of a data fusion result.

In order to solve the existing problems, the application provides a method, a device and a system for multi-sensor data fusion. Next, a method for multi-sensor data fusion provided by the embodiment of the present application is described in detail with reference to fig. 3 to 6.

Fig. 3 is a method for multi-sensor data fusion according to an embodiment of the present disclosure. It is understood that the method shown in fig. 3 can be applied, but not limited to, in the application scenario 100 shown in fig. 1, and at this time, the computing device 103 in fig. 1 can be used as an execution subject to perform the method for multi-sensor data fusion described in fig. 3. As shown in fig. 3, the method for multi-sensor data fusion provided by the embodiment of the present application includes S310 to S330. Next, details of S310 to S330 will be described.

S310, three-dimensional point cloud data of the target scene collected by the first sensor are obtained.

The three-dimensional point cloud data of the target scene is a data set including a plurality of points, each point of the plurality of points indicating three-dimensional coordinate information of the each point in the target scene, and the each point not including color information of the each point in the target scene. That is, each point in the three-dimensional point cloud data includes depth information for a geographic location. It can be understood that the three-dimensional structure information of the target scene can be determined according to the three-dimensional coordinate information carried by the three-dimensional point cloud data. Illustratively, the three-dimensional point cloud data of the target scene may be represented by the following data set:

{P1,P2,.....,Pi} (1.1)

wherein i is a positive integer of 2 or more. Pi can be expressed by the following equation:

Pi＝(x,y,z) (1.2)

where (x, y, z) represents the three-dimensional coordinates of Pi at the target scene. Optionally, each point in the three-dimensional point cloud data of the target scene may further include intensity information and scanning information (e.g., scanning direction and/or scanning angle), and the like.

The first sensor is a sensor capable of acquiring three-dimensional point cloud data. For example, after the user clicks a data acquisition button of a first sensor, the first sensor located in the target scene may start to acquire three-dimensional point cloud data of the target scene; after the user clicks a data acquisition stop button of the first sensor, the first sensor positioned in the target scene stops acquiring the three-dimensional point cloud data of the target scene. The type of the first sensor is not particularly limited. For example, the first sensor may be, but is not limited to, a 3D lidar.

When the computing device 103 is used to execute 310, a method for acquiring the three-dimensional point cloud data of the target scene acquired by the first sensor by the computing device 103 is not particularly limited. For example, after the first sensor acquires the three-dimensional point cloud data of the target scene, the first sensor actively transmits the three-dimensional point cloud data to the computing device 103. As another example, in response to receiving an instruction sent by the computing device 103, the first sensor transmits the three-dimensional point cloud data to the computing device 103, wherein the instruction instructs the acquired three-dimensional point cloud data of the target scene to be sent to the computing device 103. The computing device 103 and the first sensor may be two devices directly or indirectly connected through a network, which is not particularly limited.

The target scene is not particularly limited, and can be selected according to actual requirements. For example, the target scene may be an indoor scene, or an outdoor scene.

And S320, acquiring two-dimensional image data of the target scene acquired by the second sensor, wherein the two-dimensional image data and the three-dimensional point cloud data are data of the target scene at the target moment.

The two-dimensional image data of the target scene is two-dimensional image data including a plurality of pixel points, each pixel point indicating a position in a two-dimensional space of the target scene, and each pixel point having color information. In some implementations, the two-dimensional image data is two-dimensional color image data and the color information may be Red, green, and Blue (RGB) color values, e.g., 256 Red +100 Green +76blue, where Blue represents a Blue component, green represents a Green component, and Red represents a Red component. In each color component, the smaller the numerical value, the lower the luminance, and the larger the numerical value, the higher the luminance. In other implementations, the two-dimensional image data is two-dimensional grayscale image data, and the color information may be a grayscale value.

The second sensor is a sensor capable of acquiring two-dimensional image data. For example, after the user clicks a data acquisition button of the second sensor, the second sensor located in the target scene may start to acquire the two-dimensional image data of the target scene; after the user clicks a data acquisition stop button of the second sensor, the second sensor located in the target scene stops acquiring the two-dimensional image cloud data of the target scene. The type of the second sensor is not particularly limited. For example, the second sensor may be, but is not limited to, a monocular camera or a binocular camera.

The two-dimensional image data and the three-dimensional point cloud data are data of a target scene at a target time. That is to say, the two-dimensional image data is image data obtained by shooting the target scene by the second sensor at the target time, and the three-dimensional point cloud data is point cloud data obtained by perceiving the target scene by the first sensor at the target time.

Optionally, in another implementation manner, in the step S310, three-dimensional point cloud data of the target scene acquired by the first sensor at multiple moments may be acquired; in the above step S320, two-dimensional image data of the target scene acquired by the second sensor at multiple times may be acquired; wherein the plurality of times includes a target time. The step of time synchronization may also be performed in order to determine the three-dimensional point cloud data of the target scene at the target time acquired by the first sensor and the two-dimensional image data of the target scene at the target time acquired by the second sensor. For example, time synchronization may be performed using nearest neighbor based timestamp matching. In the implementation mode, when the laser radar and the camera acquire data, the time for starting acquisition is recorded, and the data with the most similar acquisition time is found out in a nearest neighbor matching mode. Generally, the acquisition frequency of the laser radar is low, and the acquisition frequency of the camera is high. And discarding the camera data acquired more than once by the nearest neighbor matching method, and matching the two-dimensional image data of the remaining cameras with the three-dimensional point cloud data of the laser radar. The method can consider that the recording time of each group of data is the same, namely, the laser radar and the camera observe the scene at the same moment.

When the computing device 103 is configured to execute the above S310, a method of acquiring the three-dimensional point cloud data of the target scene acquired by the first sensor by the computing device 103 is not particularly limited. For example, after the first sensor acquires the three-dimensional point cloud data of the target scene, the first sensor actively transmits the three-dimensional point cloud data to the computing device 103. As another example, in response to receiving an instruction sent by the computing device 103, the first sensor transmits the three-dimensional point cloud data to the computing device 103, wherein the instruction instructs the acquired three-dimensional point cloud data of the target scene to be sent to the computing device 103. The computing device 103 and the first sensor may be two devices directly or indirectly connected through a network, which is not particularly limited.

In some implementation manners, the first sensor is a laser radar, the second sensor is a camera, the target scene is an excavator working scene, the laser radar and the camera are arranged at the top of an excavator cab, and the distance between the laser radar and the camera is smaller than a preset threshold value. The distance between the laser radar and the camera is set to be smaller than a preset threshold, which can be understood as the overlapping area of the laser radar and the camera vision field. Illustratively, fig. 5A shows a schematic of the distance between a camera and a lidar mounted on top of the excavator cab. Optionally, in another implementation manner, the first sensor is a laser radar, the second sensor is a camera, the target scene is a highway, the laser radar and the camera are arranged at the top of the intelligent vehicle, and a distance between the laser radar and the camera is smaller than a preset threshold.

And S330, performing data fusion on the three-dimensional point cloud data and the two-dimensional image data according to a mapping relation to obtain a three-dimensional reconstruction result of the target scene, wherein the mapping relation is the mapping relation from a first sensor coordinate system to a second sensor coordinate system, and the mapping relation is determined according to coordinates of a plurality of corner points of a plurality of calibration plates under the second sensor coordinate system and coordinates of a plurality of corner points of the plurality of calibration plates under the first sensor coordinate system.

In S330, the mapping relationship is determined according to coordinates of the plurality of corner points of the plurality of calibration plates in the second sensor coordinate system and coordinates of the plurality of corner points of the plurality of calibration plates in the first sensor coordinate system. It is understood that the plurality of corner points of each calibration board in the embodiment of the present application may be 4 corner points of each calibration board. For example, fig. 5C shows a schematic structural diagram of a calibration board described in S330 above. Referring to fig. 5C, the calibration board has 4 corner points, which are respectively denoted as corner point 1, corner point 2, corner point 3, and corner point 4. In the embodiment of the present application, the mapping relationship is determined according to coordinates of a plurality of corner points of each of the plurality of calibration plates in the second sensor coordinate system and coordinates of a plurality of corner points of each of the plurality of calibration plates in the first sensor coordinate system. In this implementation manner, a plurality of different two-dimensional codes respectively cover a plurality of calibration boards, the plurality of two-dimensional codes correspond to the plurality of calibration boards one to one, and one two-dimensional code covers the surface of the corresponding calibration board. Wherein, different two-dimensional codes can be understood as different geometric patterns and colors contained in different two-dimensional codes. The number of the plurality of calibration plates is not particularly limited, and the number of the calibration plates can be specifically selected according to actual requirements. For example, the plurality of calibration plates may include 2 calibration plates, or 3 calibration plates. In order to improve the calculation accuracy and the reliability of the calculation result, the total number of the plurality of corner points of the selected plurality of calibration plates can be set to be greater than or equal to 8 in practical application.

In some implementations, the mapping relationship is determined by the computing device 103 when the computing device 103 is configured to perform S330 described above. In this implementation, before the computing device 103 performs S330 described above, the computing device 103 is further configured to perform the following steps: and determining a mapping relation according to the coordinates of the plurality of corner points of the plurality of calibration plates in the second sensor coordinate system and the coordinates of the plurality of corner points of the plurality of calibration plates in the first sensor coordinate system. Optionally, in other implementations, when the computing device 103 is configured to perform S330, the mapping relationship is determined by a device other than the computing device 103. In this manner, before the computing device 103 performs S330 described above, the computing device 103 is further configured to perform the following steps: and acquiring the determined mapping relation from other equipment.

Optionally, before performing the step S330, a step of determining a mapping relationship may be further performed, where the step includes: and calculating the coordinates of the multiple corner points of the multiple calibration plates under the first sensor coordinate system and the coordinates of the multiple corner points of the multiple calibration plates under the second sensor coordinate system by using an optimization algorithm to obtain a mapping relation, wherein the optimization algorithm is a Singular Value Decomposition (SVD) method or a least square method. In the embodiment of the present application, the method for determining the coordinates of the plurality of corner points of each calibration board in the first sensor coordinate system is the same, and the method for determining the coordinates of the plurality of corner points of each calibration board in the second sensor coordinate system is also the same. In some implementations, the first sensor may be a lidar and the second sensor may be a camera. For example, in the first implementation manner in S405 below, the details of "calculating coordinates of 8 corner points of the target calibration board (i.e., 4 corner points of the target calibration board 1 and 4 corner points of the target calibration board 2) in the lidar coordinate system and coordinates of 8 corner points of the target calibration board in the camera coordinate system by using the SVD method to obtain the mapping relationship" are described in detail. The contents not described in detail herein can be referred to the related contents below. For example, in the following implementation manner two in S405, "calculating coordinates of 8 corner points of the target calibration board (i.e., 4 corner points of the target calibration board 1 and 4 corner points of the target calibration board 2) in the lidar coordinate system and coordinates of the 8 corner points in the camera coordinate system by using a least square method to obtain a mapping relationship" is described in detail. The contents not described in detail herein can be referred to the related contents below. It is understood that the target calibration board 1 and the target calibration board 2 in S405 below may be a plurality of calibration boards in the above implementation, the lidar in S405 below is the first sensor in S330 above, and the camera in S405 below is the second sensor in S330 above.

Optionally, in some implementations, a plurality of different two-dimensional codes are associated with a plurality of calibration boards, and each two-dimensional code covers an associated calibration board plane, in this implementation, before performing the above S330, the following operations may also be performed: determining coordinates of a plurality of corner points of the calibration plate associated with each two-dimensional code under a world coordinate system according to plane constraints, coordinates of a central point of each two-dimensional code under the world coordinate system and distances from the central point of each two-dimensional code to four sides of the associated calibration plate, wherein the plane constraints are constraints that each two-dimensional code covers the plane of the associated calibration plate; and determining the coordinates of the multiple corner points of the calibration plate associated with each two-dimension code under the coordinate system of the second sensor according to the coordinates of the multiple corner points of the calibration plate associated with each two-dimension code under the world coordinate system and the position and orientation information of the calibration plate associated with each two-dimension code under the coordinate system of the second sensor. It can be understood that the pose information of the calibration board associated with each two-dimensional code under the second sensor is the same as the pose information of each two-dimensional code under the second sensor. That is to say, the pose information of each two-dimensional code under the second sensor is obtained, that is, the pose information of the calibration board associated with each two-dimensional code under the second sensor can be obtained. Optionally, before performing the above steps, the following steps may be further performed: and determining the pose information of each two-dimensional code by using a pose tool. The pose tool is not particularly limited and can be selected according to actual requirements. For example, when the two-dimensional code is an ArUco two-dimensional code, the pose information of the ArUco two-dimensional code can be obtained by using an ArUco pose determination tool. In some implementations, the determining, according to the coordinates of the multiple corner points of the calibration board associated with each two-dimensional code in the world coordinate system and the pose information of the calibration board associated with each two-dimensional code in the second sensor coordinate system, the coordinates of the multiple corner points of the calibration board associated with each two-dimensional code in the second sensor coordinate system includes: and determining the coordinates of the 4 angular points of the calibration plate associated with each two-dimensional code in the second sensor coordinate system by the product of the pose information of the calibration plate associated with each two-dimensional code under the second sensor and the coordinates of the 4 angular points of the calibration plate associated with each two-dimensional code in the world coordinate system. Specifically, the coordinates of 1 corner point of one calibration board in the second coordinate system are equal to the product of the pose information of the one calibration board in the second sensor and the coordinates of the 1 corner point in the world coordinate system. In the implementation manner, each two-dimensional code covers the associated calibration plate plane, so that the pose information of the calibration plate associated with each two-dimensional code under the second sensor is the same as the pose information of each two-dimensional code under the second sensor. The method for solving the pose information of each two-dimensional code under the second sensor is not particularly limited. Coordinates of a plurality of corner points of the calibration board associated with each two-dimensional code in a world coordinate system can be understood as positions of the plurality of corner points of the calibration board associated with each two-dimensional code in the real world. The two-dimensional code covers the plane of the associated calibration board, and whether the center point of the two-dimensional code is overlapped with the center point of the calibration board is not particularly limited. That is, the center point of the two-dimensional code may coincide with the center point of the associated calibration plate, or the center point of the two-dimensional code may not coincide with the center point of the associated calibration plate, that is, the horizontal distance between the center point of the two-dimensional code and the center point of the associated calibration plate is not zero. Exemplarily, fig. 5C shows a schematic structural diagram of the calibration board of S330. For example, fig. 6 shows a schematic diagram of a two-dimensional code and an associated calibration plate in an embodiment of the present application, and a center point of the two-dimensional code in fig. 6 is not coincident with a center point of the associated calibration plate. The distances from the center point of the two-dimensional code to the four edges of the calibration plate can be measured in advance. Exemplarily, determining the coordinate representation of the 8 corner points of the target calibration board (i.e. 4 corner points of the target calibration board 1 and 4 corner points of the target calibration board 2) in the camera coordinate system is described in detail in S403 below. It is to be understood that the target calibration board 1 and the target calibration board 2 in the following S403 may be a plurality of calibration boards in the above implementation, the ArUco two-dimensional code in the following S403 is the two-dimensional code in the above S330, and the camera in the following S403 is the second sensor in the above S330. In the process of determining the coordinates of the 4 angular points of the calibration plate in the second sensor coordinate system, the coordinates of the 4 angular points of each calibration plate in the camera coordinate system are calculated through the distances from the center point of the two-dimensional code to the four sides of each calibration plate, which are measured in advance.

Optionally, in some implementations, before performing the above S330, the following steps may also be performed: performing the following calculation on each calibration plate in the plurality of calibration plates to obtain coordinates of a plurality of corner points of each calibration plate in the first sensor coordinate system: determining point cloud data of four edges of each calibration plate according to the point cloud data of each calibration plate acquired by the first sensor; performing linear fitting on the point cloud data of the four edges of each calibration plate to obtain a spatial representation of a straight line where the four edges of each calibration plate are located; and calculating the intersection point of the straight line of any two adjacent straight lines in the straight lines of the four sides of each calibration plate in the three-dimensional space to obtain a plurality of intersection points, wherein the plurality of intersection points are the coordinates of the plurality of corner points of each calibration plate in the first sensor coordinate system. Optionally, before the above steps are performed, the point cloud data of each calibration plate acquired by the first sensor may also be acquired. It is understood that the point cloud data of the calibration plate acquired by the first sensor does not include other point cloud data than the calibration plate. In some implementations, the determining point cloud data of four sides of the calibration plate according to the point cloud data of the calibration plate collected by the first sensor includes: displaying the point cloud data of the calibration plate collected by the first sensor to a user through a UI (user interface); and responding to the point cloud data of each edge included by the calibration board manually selected in the display interface by the user, and obtaining the point cloud data of each edge included by the calibration board.

The method for carrying out data fusion on the three-dimensional point cloud data and the two-dimensional image data according to the mapping relation so as to obtain a three-dimensional reconstruction result of a target scene comprises the following steps: performing data fusion on the three-dimensional point cloud data and the two-dimensional image data according to the internal parameters and the mapping relation of the second sensor to obtain a fusion result; rendering the fusion result to obtain a three-dimensional reconstruction result of the target scene. In some implementations, the mapping relationship is represented by a rotation matrix and a translation vector, and the three-dimensional point cloud data and the two-dimensional image data are subjected to data fusion according to the internal parameter of the second sensor and the mapping relationship to obtain a fusion result, including: selecting any one point included in the three-dimensional point cloud data with the mapping relation and a pixel point included in the two-dimensional image data according to the internal parameters, the rotation matrix and the translation vector of the second sensor; and carrying out data fusion on any one point included in the three-dimensional point cloud data with the mapping relation and a pixel point included in the two-dimensional image data to obtain a fusion result. Optionally, in some implementations, when the second sensor is a camera, the internal parameters of the second sensor include: focal length, principal point and distortion coefficient.

Optionally, after performing the above S330, the following steps may be further performed: and sending the three-dimensional reconstruction result of the target scene to the client for displaying the three-dimensional reconstruction result of the target scene at the client. In some implementations, when the computing device 103 is configured to perform the operation of sending the three-dimensional reconstruction result of the target scene to the client, the computing device 103 may send the three-dimensional reconstruction result of the target scene to the client by using a topic method. In this implementation, the computing device 103 and the client should also establish a topic communication mode. For example, the display interface of the rendering result of the client may be as shown in fig. 2 above. The user can also click a rendering result button of the display interface of the application program 3 to obtain rendering results of three-dimensional reconstruction of the excavator working scene in different directions and viewing angles, different sizes, or different positions.

In the embodiment of the application, data fusion is performed on the three-dimensional point cloud data and the two-dimensional image data according to a mapping relation, wherein the mapping relation is determined directly according to coordinates of a plurality of corner points of a plurality of calibration plates in a second sensor coordinate system and coordinates of a plurality of corner points of a plurality of calibration plates in a first sensor coordinate system. In the process of determining the coordinates of the plurality of angular points of each of the plurality of calibration plates in the second sensor coordinate system, the coordinates of the plurality of angular points of each of the plurality of calibration plates in the camera coordinate system are calculated by measuring the distances from the four sides of each of the plurality of calibration plates to the center point of each of the plurality of calibration plates in advance. In summary, the multi-sensor data fusion method provided by the embodiment of the application is easy to implement, and can improve the precision of the data fusion result and the accuracy of environmental perception.

Next, fig. 4 to 6 describe another multi-sensor data fusion method provided in the embodiment of the present application. It should be understood that the examples of fig. 4-6 are merely intended to assist those skilled in the art in understanding the embodiments of the present application and are not intended to limit the embodiments of the application to the specific values or specific contexts illustrated. It will be apparent to those skilled in the art that various equivalent modifications or variations are possible in light of the examples of fig. 4-6 given below, and such modifications and variations also fall within the scope of the embodiments of the present application. It is understood that the multi-sensor data fusion method shown in fig. 4 is a specific example of the multi-sensor data fusion method shown in fig. 2 described above. Specifically, the method shown in fig. 4 is described by taking the target scene in the method shown in fig. 2 as the working scene of the excavator, the first sensor is a laser radar, the second sensor is a camera, the plurality of calibration plates include a target calibration plate 1 and a target calibration plate 2, and a plurality of angular points of each of the plurality of calibration plates are taken as 4 angular points of the target calibration plate.

Fig. 4 is a schematic diagram of another multi-sensor data fusion method provided in the embodiments of the present application. As shown in fig. 4, the method for multi-sensor data fusion provided by the embodiment of the present application includes S401 to S408. Next, S401 to S408 will be described in detail.

S401, mounting and fixing a laser radar and a camera on the top of a cab of the excavator.

To help the operator know the conditions of the excavator working scene (e.g., the scene in front of the excavator) as much as possible, the lidar and the camera can be mounted on the top of the excavator cabin, and after calibration is completed, the lidar and the camera are relatively fixed and cannot move. For example, taking the application scenario 100 shown in fig. 1 as an example, the excavator in S401 may be the collection vehicle 105 in fig. 1, the lidar in S401 may be the lidar 106 in fig. 1, and the camera in S401 may be the camera 107 in fig. 1. At the top of the cab of the excavator, the distance between the setting position of the laser radar and the setting position of the camera is close as much as possible, so that the overlapping area of the fields of vision of the laser radar and the camera is improved. Illustratively, fig. 5A shows a schematic of the distance between a camera and a lidar mounted on top of the excavator cab.

The laser radar is used for acquiring the working scene of the excavator and acquiring corresponding point cloud data, wherein the point cloud data comprises a plurality of points, and each point indicates the three-dimensional coordinate information of each point in the working scene of the excavator. The excavator working scene can also comprise a scene around the excavator bucket.

The camera is used for shooting the working scene of the excavator to obtain a two-dimensional image of the target scene. The two-dimensional image comprises a plurality of pixel points, each pixel point indicates a position in a two-dimensional space where the excavator works, and each pixel point comprises color information of the position. In some implementations, the color information may be optical tristimulus RGB color information, where R represents Red (Red), G represents Green (Green), and B represents Blue (Blue).

After the above S401 is executed, that is, after the positions of the laser radar and the camera are fixed in front of the cab of the excavator, the relative position relationship between the laser radar and the camera arranged on the top of the cab of the excavator needs to be calibrated. However, since the camera is distorted, the camera needs to be subjected to distortion removal processing, i.e. calibration of internal parameters of the camera (also called internal parameters of the camera). It can be understood that the internal parameters of the camera are fixed, and the camera can be used all the time after calibration is completed; the internal reference value of the camera is influenced by the manufacturing process of the camera, the lens and the like.

S402, the host calibrates the internal parameters of the camera to obtain the internal parameters of the camera.

Wherein the internal parameters of the camera include the following parameters: focal length of camera (f) _x ,f _y ) Principal point coordinate (c) _x ,c _y ) And a distortion parameter gamma. Ideally, γ =0.

The calibrating the internal parameters of the camera to obtain the internal parameters of the camera includes: shooting the black and white checkerboard calibration plate in different directions and angles by using a camera to obtain a plurality of images, wherein the directions and the angles corresponding to any two images in the plurality of images are not completely the same; and calculating the obtained multiple images to obtain the internal parameters of the camera. The size of the black and white checkerboard is known. The black and white checkerboard is obtained by pasting the black and white checkerboard on the surface of one calibration board, and the black and white checkerboard completely covers the surface of the one calibration board. For example, fig. 5B shows a schematic diagram of a black and white checkerboard, and fig. 5B also shows 4 corner points of the black and white checkerboard.

Next, an example of "calculating a plurality of acquired images to acquire an internal parameter of a camera" will be described by way of example. The host obtaining internal parameters of the camera in this example may include S402-1 and S402-2:

s402-1, determining the product of the internal parameter matrix of the camera and the external parameter matrix of the camera.

The product of the internal reference matrix of the camera and the external reference matrix of the camera is also called a homography matrix. Fixing the world coordinate system on the black and white checkerboard calibration board (i.e. the Z axis of the world coordinate system overlaps with the black and white checkerboard calibration board), the physical coordinate W =0 of any point on the black and white checkerboard calibration board, and the imaging model can be represented by the following formula:

in the above formula, Z represents a scale factor; a represents an internal reference matrix; (R1R 2T) represents an external parameter matrix of the camera; (u, v, 1) representing pixel coordinates of the black and white checkerboard corner points on the image; and (U, V, 1) represents the spatial coordinates of the black and white checkerboard corner points in the world coordinate system. It can be understood that a is a fixed value for different pictures taken by the same camera; for the same picture, a and (R1R 2T) are constant; for a single point on the same picture, a, (R1R 2T) and Z are not valued.

When a (R1R 2T) in the above formula (2.1) is denoted as matrix H, i.e. H is the product of the internal reference matrix and the external reference matrix, and three columns of the matrix H are denoted as (H1, H2, H3), (u, v, 1) in the above formula (2.1) can be expressed by the following formula:

after eliminating the scale factor Z in the above equation (2.2), u and v in the above equation (2.2) can be expressed by the following equations:

in the above equation (2.3), the scale factor Z has been eliminated. That is, the above formula (2.3) holds for all corner points on the same picture. In the above formula (2.3), (u, v) represents the coordinates of the black and white chessboard calibration corner points under the pixel coordinate system; and (U, V) represents the coordinates of the black and white chessboard pattern calibration corner points under the world coordinate system.

And (U, V) pixel coordinates of the angular point of the black-white chessboard grid calibration plate can be obtained through an image recognition algorithm, and because the world coordinate system of the black-white chessboard grid calibration plate is predefined, the size of each grid on the black-white chessboard grid calibration plate is also known, and based on the formula (2.3) and the obtained pixel coordinates (U, V) of the angular point of the black-white chessboard grid calibration plate, the (U, V) under the world coordinate system can be calculated. Since multiple sets of black and white checkerboard calibration board images are collected, there are multiple sets of (U, V) and (U, V) data, so that the H matrix can be solved.

S402-2, solving the obtained H matrix according to the S402-1, and solving A.

After the above-mentioned S202-1 is performed, the H matrix can be solved. The matrix H = a (R1R 2T) is known above. Next, from the solved H matrix, and H = a (R1R 2T), the camera's internal reference matrix a can be solved.

Using R1 and R2 as two columns of the rotation matrix R, R1 and R2 have a relationship of unit orthogonality, that is:

from the relationship of H, R and R2, it can be seen that:

substituting the above equation (2.5) into the above equation (2.4), further, the above equation (2.4) can be expressed as:

recording A in the above formula (2.6) ^-T A ^-1 B, then B is a symmetric matrix. Based on the method, a matrix B can be solved first, and then an internal reference matrix A of the division camera is solved through the matrix B.

Meanwhile, for simplicity, the internal reference matrix a of the camera is recorded as:

wherein α is a length of a focal length in the x-axis direction described using a pixel; gamma is a distortion parameter; to describe the length of the focal length in the y-axis direction using pixels; (u) ₀ ,v ₀ ) Is the principal point (i.e., image origin) coordinates.

The inverse a of the reference matrix of the camera ^-1 Can be expressed by the following formula:

then use matrix a ^-T And matrix A ^-1 Representing matrix B, which can be represented by the following formula:

further, B = a may be used ^-T A ^-1 The constraint previously obtained by orthogonalizing the R1, R2 units is:

therefore, H must be calculated in order to solve matrix B _i ^T BH _j . Wherein H _i ^T BH _j Can be expressed by the following formula:

the constraint equation obtained by the unit orthogonality of R1 and R2 can be:

wherein the matrix v can be represented by the following formula:

since matrix H is known, and matrix v is in turn composed entirely of the elements of matrix H, matrix v is known.

Then, the vector B is solved to obtain the matrix B. Each calibration plate image may provide a vb =0 constraint relationship that contains two constraint equations. Vector b contains 6 unknown elements. Thus, the two constraint equations provided by a single picture are not sufficient to solve for vector b. Therefore, we can solve the vector b by taking 3 calibration plate photos and obtaining 3 vb =0 constraint relations, i.e. 6 equations. When the number of calibration plate images is greater than 3 (for example, but not limited to, the number of the calibration plate images is set to be 15 or 20), least squares fitting may be used to obtain the optimal vector B, and a matrix B is obtained.

The matrix B shown in accordance with equation (2.9) above and the camera's intrinsic parameters (i.e., α, β, γ, u) ₀ And v ₀ ) The internal parameters of the camera can be represented by the following formulas:

to this end, all internal parameters of the camera (i.e., α, β, γ, u) are solved ₀ And v ₀ )。

S403, the host computer determines the coordinate representation of the 4 corner points of the target calibration plate 1 in the camera coordinate system and the coordinate representation of the 4 corner points of the target calibration plate 2 in the camera coordinate system.

Wherein, target calibration board 1 can be the calibration board that pastes Aruco two-dimensional code 1 on the plane, and target calibration board 2 can be the calibration board that pastes this Aruco two-dimensional code 2 on the plane, and Aruco two-dimensional code 1 is different with Aruco two-dimensional code 2. It can be understood that the thickness of the target calibration plate is negligible in the embodiment of the present application.

In the embodiment of the present application, the method for determining the coordinate representation of the 4 corner points of the target calibration plate 1 in the camera coordinate system is the same as the method for determining the coordinate representation of the 4 corner points of the target calibration plate 2 in the camera coordinate system. For convenience of description, a method of determining coordinate representation of 4 corner points of a target calibration board in a camera coordinate system is described as an example, where when the target calibration board in the following text is the target calibration board 1 in the foregoing text, the ArUco two-dimensional code in the following text is an ArUco two-dimensional code 1; when the target calibration board in the following text is the target calibration board 2 in the foregoing text, the ArUco two-dimensional code in the following text is the ArUco two-dimensional code 2. The host computer determines coordinate representation of 4 corner points of the target calibration plate under a camera coordinate system, and the coordinate representation comprises the following steps: the host machine utilizes the Aruco pose tool to obtain pose information of the Aruco two-dimensional code under a camera coordinate system, wherein the pose information of the Aruco two-dimensional code under the camera coordinate system is the same as the pose information of a target calibration plate associated with the Aruco two-dimensional code under the camera coordinate system; host constraints according to planeDetermining coordinates of 4 corner points of a calibration plate associated with the Aruco two-dimensional code under a world coordinate system according to the coordinates of the central point of the Aruco two-dimensional code under the world coordinate system and the distances from the central point of the Aruco two-dimensional code to four edges of the associated target calibration plate, wherein the plane constraint is the constraint that the Aruco two-dimensional code covers the plane of the associated target calibration plate; and the host machine determines the coordinates of the 4 corner points of the target calibration plate in the camera coordinate system according to the position and orientation information of the 4 corner points of the target calibration plate in the camera coordinate system and the coordinates of the 4 corner points of the target calibration plate in the world coordinate system. The distances from the central point of the Aruco two-dimensional code to the four edges of the target calibration plate can be measured in advance. For example, the distances from the central point of the ArUco two-dimensional code to the four edges of the target calibration plate are manually measured. Exemplarily, the coordinate representation of the above-mentioned 4 corner points of the target calibration board in the camera coordinates is described below by way of example with a schematic diagram of a two-dimensional code and a calibration board shown in fig. 6. It is understood that the two-dimensional code shown in fig. 6 is the ArUco two-dimensional code described above, and the calibration board shown in fig. 6 is the target calibration board described in the above implementation. The host computer can solve the pose information of the Aruco two-dimensional code under the camera coordinate system by using the Aruco pose determining tool, namely the pose coordinates of the target calibration plate related to the Aruco two-dimensional code under the camera coordinate system are known. Referring to fig. 6, the length of the calibration board is s2, the width of the calibration board is s1, the lengths of four sides of the two-dimensional code are e, the distances between the two-dimensional code and the two-dimensional code are b1, and the distances between the two-dimensional code and the two-dimensional code are b2 in centimeters (cm). Based on the y-axis vertical calibration plate facing outwards and defining the y-axis coordinates of the points on the calibration plate as 0,x axis parallel s2 and y-axis parallel s1, the coordinates of 4 angular points (i.e. upper left angular point, lower left angular point, upper right angular point and lower right angular point) of the calibration plate under the world coordinate system can be respectively expressed as: calibrating the space coordinates of the upper left corner point of the plate

Spatial coordinate of upper right corner point of calibration plate>

Calibration plate left lower corner point space coordinate>

Spatial coordinates of lower right corner point of calibration plate

Assuming that marker _ position is the position and posture information of the calibration plate in the camera coordinate system, and point _ board _ temp is the coordinate of one corner point of the calibration plate in the world coordinate system, the coordinate representation of one corner point of the calibration plate in the camera coordinate system can be obtained by the following formula:

points_board＝marker_pose×points_board_temp (2.15)

in the process of determining the coordinate representation of the 4 corner points of each target calibration plate in the camera coordinate system, the coordinates of the 4 corner points of the calibration plate in the camera coordinate system are calculated according to the distances from the four edges of each target calibration plate to the center point of the calibration plate, and the method is easy to implement.

S404, the host computer determines the coordinate representation of the 4 angular points of the target calibration plate 1 in the laser radar coordinate system and determines the coordinate representation of the 4 angular points of the target calibration plate 2 in the laser radar coordinate system.

The lidar coordinate system is used to describe the relative position of the object and the lidar. Lidar coordinate system represented as [ X ] _L ,Y _L ,Z _L ]Wherein the origin is the geometric center of the laser radar, X _L Axial horizontal forward, Y _L Axis horizontal to left, Z _L The axis is vertically upward, and accords with the rule of a right-hand coordinate system.

In the embodiment of the present application, the method for determining the coordinate representation of 4 corner points of the target calibration plate 1 in the lidar coordinate system is the same as the method for determining the coordinate representation of 4 corner points of the target calibration plate 2 in the lidar coordinate system. For convenience of description, the following description will be given by taking as an example a method for determining coordinate representation of 4 corner points of a target calibration board in a lidar coordinate system, where the target calibration board in the following description may be the target calibration board 1 in the foregoing description, and may also be the target calibration board 2 in the foregoing description. The host computer determines coordinate representation of 4 corner points of the target calibration plate under a laser radar coordinate system, and the coordinate representation comprises the following steps: the method comprises the steps that a host computer obtains point cloud data of a target calibration plate collected by a laser radar; the host computer selects the point cloud data of the four edges included by the target calibration plate through manual interaction, and performs linear fitting on the point cloud data of the four edges to obtain the spatial representation of the straight line where the four edges are located; and respectively calculating the intersection points of the straight lines of any two adjacent straight lines of the four straight lines to obtain a plurality of intersection points, wherein the intersection points are the coordinate representation of 4 corner points included in the target calibration plate under a laser radar coordinate system. Through the implementation mode, the coordinate representation of the 4 corner points included by the target calibration plate 1 in the laser radar coordinate system and the coordinate representation of the 4 corner points included by the target calibration plate 2 in the laser radar coordinate system can be obtained. It is understood that the point cloud data of the calibration plate collected by the laser radar does not include other point cloud data than the calibration plate. For example, the host computer selects point cloud data of each edge included in the target calibration plate through interaction with human, including: the host computer displays the point cloud data of the calibration plate acquired by the laser radar to a user through a display interface; and responding to the point cloud data of each edge included by the target calibration board manually selected in the display interface by the user, and obtaining the point cloud data of each edge included by the target calibration board by the host computer.

S405, the host calculates a rotation matrix R and a translational vector t according to coordinate representation of 4 corner points of the target calibration plate 1 in a laser radar coordinate system, coordinate representation of 4 corner points of the target calibration plate 2 in the laser radar coordinate system, coordinate representation of 4 corner points of the target calibration plate 1 in the laser radar coordinate system, and coordinate representation of 4 corner points of the target calibration plate 2 in the laser radar coordinate system.

Performing S405 above, it can be understood that the host computer calculates external parameters (i.e., the rotation matrix R and the translation vector t) between the lidar and the camera directly by acquiring coordinates of the calibration plate in the lidar coordinate system and coordinates of the calibration plate in the camera coordinate system. In this implementation, the accumulated error caused by performing the conversion between different coordinate systems for multiple times can be avoided, so that the obtained R and t are more accurate.

The conversion from the lidar coordinate system to the camera coordinate system is a rigid transformation process that may be represented by a rotation matrix R and a translation vector t. Wherein, the rotation matrix R is a matrix with the size of 3 multiplied by 3 and represents the rotation of the space coordinate; the translation vector t is a matrix of size 3 × 1, representing a spatial coordinate translation.

The embodiment of the present application provides two ways to implement the step described in S405 above. In the following, two implementations, namely the first implementation and the second implementation, are described separately. First, a flow of calculating the rotation matrix R and the translational vector t in the first implementation will be described. For convenience of description, hereinafter, "4 corner points of the target calibration plate 1 and 4 corner points of the target calibration plate 2" will be abbreviated as "8 corner points of the target calibration plate".

Implementation mode one

In a first implementation manner, an embodiment of the present application provides a method for calculating a rotation matrix R and a translational vector t by using an SVD decomposition method according to a coordinate representation of 8 corner points of a target calibration plate in a lidar coordinate system and a coordinate representation of 8 corner points of the target calibration plate in the lidar coordinate system. Next, a flow of solving R and t by the SVD decomposition method is described, which includes step 1.1 to step 1.7.

Step 1.1, respectively calculating a mass center by a host according to coordinate representation of 8 angular points of a target calibration plate under a camera coordinate system and coordinate representation of 8 angular points of the target calibration plate under a laser radar coordinate system:

in the above formula, p _i Representing coordinate representation of an angular point i in 8 angular points of the target calibration plate under a camera coordinate system; p is p _i The center of mass of; p' _i Representing coordinate representation of an angular point i in 8 angular points of the target calibration plate under a camera coordinate system; p 'is p' _i The center of mass of; i =1,2,3,4,5,6,7,8.

Step 1.2, the host computer calculates the position of the coordinates of 8 corner points of the target calibration plate in the camera coordinate system relative to the centroid corresponding to the coordinatesA shift vector q _i And calculating a displacement vector q 'of the coordinates of the 8 corner points of the target calibration plate in the laser radar coordinate system relative to the mass center corresponding to the coordinates' _i ：

q _i ＝p _i -p,q’ _i ＝p’ _i -p’ (2.17)

Wherein q is _i Represents p _i A displacement vector relative to p; q's' _i Represents p' _i Displacement vector relative to p'.

And step 1.3, calculating an H matrix by the host machine by using the displacement vector of the centroid.

And 1.4, carrying out SVD on the H matrix by the host.

The host performs SVD on the H matrix, and the H matrix can be represented by the following formula:

H＝UΛV ^T (2.19)

wherein, U and V are orthogonal matrixes of 3 multiplied by 3, and Lambda is a non-negative diagonal matrix of 3 multiplied by 3.

And 1.5, calculating a rotation matrix R by the host machine based on the matrixes U and V.

Wherein the relationship between the matrices U, V and R can be represented by the following equation:

R＝VU (2.20)

the reason for the rotation matrix R being equal to the product of U and V is described below:

for the rotation matrix R, the objective function of least squares can be represented by the following formula:

in the above equation (2.21), the first two constant terms are removed to minimize Σ ² The function is equivalent to a maximization function F, which can be expressed by the following formula:

let X = VU ^T Then XH = V Λ U ^T A symmetrical positive definite array is formed.

Theorem: for arbitrary positive definite matrix AA ^T And an arbitrary positive definite matrix B having the following relationship:

Trace(AA ^T )≥Trace(BAA ^T ) (2.23)

the theorem proves that: let a _i Is the ith column of matrix a, then:

based on Schwarz inequality (Schwarz inequality) inequality<x,y>| ² ≤<x,x>·<y,y>Wherein, in the step (A),<,>representing the inner product, one can obtain:

further, it is possible to obtain:

according to the theoretic conclusion, then, for a positive array XH and an arbitrary 3 × 3 orthogonal matrix B, there are:

Trace(XH)≥Trace(BXH) (2.27)

that is, the X matrix can maximize the function F, and if det (X) = +1, the X matrix is a rotation matrix. That is, the optimal solution of the matrix R is R = VU ^T 。

And step 1.6, determining whether the R obtained by the calculation in the step 1.5 is effective or not according to det (R).

Where det (R) represents the value of the determinant of the rotation matrix R.

Wherein, determining whether the R obtained by the calculation in the step 1.5 is valid according to det (R) comprises: when det (R) is equal to 1, the R obtained by the calculation of the step 1.5 is valid; when det (R) is equal to-1, the R obtained by the above calculation of step 1.5 is not valid.

In case the execution of step 1.6 determines that R calculated in step 1.5 is valid, step 1.7 is continued after step 1.6.

And step 1.7, calculating a translation vector t according to the R obtained by calculation in the step 1.5 and the p and p' obtained by calculation in the step 1.

Wherein, the translation vector t can be expressed by the following formula:

t＝p’-Rp (2.27)

next, a flow of calculating the rotation matrix R and the translational vector t in the second implementation will be described.

Implementation mode two

In a second implementation manner, the embodiment of the present application provides a method, in which a least square method is used to calculate a rotation matrix R and a translational vector t according to coordinate representation of 8 angular points of a target calibration plate in a laser radar coordinate system and coordinate representation of 8 angular points of the target calibration plate in the laser radar coordinate system.

Next, a flow of solving R and t by the least square method is described, which includes step 2.1 to step 2.4.

In the camera coordinate system, the corner point i of the 8 corner points of the target calibration plate can be recorded as p _i ，p _i E.g. C. In the lidar coordinate system, the corner point i of the 8 corner points of the calibration plate may be denoted as p' _i ，p' _i E.g. D. Where i =1,2,3,4,5,6,7,8.

Wherein p is _i And p' _i The relationship between them can be expressed by the following formula:

p’ _i ＝R·p _i +t (3.1)

in the above formula, R represents a rotation matrix and t represents a translation vector. Further, the above formula (3.1) can also be abstractly expressed as:

D＝R·C+t (3.2)

based on the above equation (3.1) and equation (3.2), R and t can be solved by the least square method.

Step 21, optionally taking p _i Belongs to C, and goes through D to find and p _i Minimum-distance target laser point p' _i M pairs (p) can be obtained _i ,p' _i ) And m is a positive integer.

Step 2.2, m pairs of (p) _i ,p' _i ) Substituting the formula, using the least squares method to calculate:

and 2.3, determining whether the current iteration meets an iteration stop condition.

Wherein determining whether the current iteration satisfies an iteration stop condition comprises: if the current iteration meets the iteration stop condition, taking R and t obtained by current iteration calculation as final iteration results; alternatively, if it is determined that the current iteration does not satisfy the iteration stop condition, step 2.4 will be performed after step 2.3 described above is performed.

The iteration stop condition may be at least one of: the error Δ E of the iteration is smaller than a preset threshold, or the total number of iterations reaches a preset number of iterations, where the error Δ E of the iteration may be a difference value of E obtained by two adjacent iterations. The preset threshold and the preset iteration number may be set according to actual scene requirements, which is not specifically limited.

Step 2.4, using R and t obtained by current iteration calculation to p used by current iteration _i And p' _i Performing update and updating p _i And p' _i The above steps 2.1 to 2.3 are performed.

Wherein, the updated p can be calculated by using the above formula (3.1) and R and t obtained by the current iteration calculation _i And p' _i 。

S406, the host machine performs data fusion on the three-dimensional point cloud data of the working scene of the excavator and the two-dimensional image data of the working scene of the excavator to obtain fused three-dimensional point cloud data with color information.

The three-dimensional point cloud data of the excavator working scene is a data set obtained by sensing the excavator working scene by a laser radar, and each point in the three-dimensional point cloud data comprises three-dimensional coordinate information of the excavator working scene. The two-dimensional image data of the excavator working scene is an image obtained by shooting the excavator working scene by a camera, and each pixel point in the two-dimensional image comprises two-dimensional coordinate information of the excavator working scene and color information corresponding to the two-dimensional coordinate information. The three-dimensional point cloud data and the two-dimensional image data are corresponding data of the excavator working scene at the same moment. Optionally, the three-dimensional point cloud data includes point cloud data of a bucket of the excavator, and the two-dimensional image data includes image data of the bucket of the excavator, so that an operator operating the excavator can conveniently rotate to switch a viewing angle, and the operator can view the construction site environment of the excavator in a 360-degree panoramic manner.

The method comprises the following steps that a host carries out data fusion on three-dimensional point cloud data and two-dimensional image data to obtain a fused three-dimensional image with color information, and comprises the following steps: the host selects any point in the three-dimensional point cloud data with the mapping relation and a pixel point included in the two-dimensional image data according to the internal parameter matrix A, the rotation matrix R and the translational vector t of the camera; and the host carries out data fusion on any point in the three-dimensional point cloud data with the mapping relation and the pixel points included in the two-dimensional image data to obtain a fused three-dimensional image with color information.

The above data fusion process is described below by way of example. For example, a module is developed on a host based on a distributed communication model Robot Operating System (ROS), and the module is configured to receive a three-dimensional point cloud data set and two-dimensional image data, analyze the three-dimensional point cloud data to obtain a plurality of three-dimensional points, traverse each of the plurality of three-dimensional points, and project each three-dimensional point P (x, y, z) onto a two-dimensional image to obtain color information of P ', P' as (R, g, b) by using a rotation matrix R, a translation vector t and an internal reference matrix a of a camera obtained by previous calibration, thereby obtaining a fused spatial point P (x, y, z, R, g, b).

Repeating the step S406 to obtain the color information of each three-dimensional point in the three-dimensional point cloud data, and rendering the points to obtain the fused three-dimensional point cloud data with the color information.

Optionally, before executing the step S406, the host may further execute the following steps: acquiring three-dimensional point cloud data from a laser radar; two-dimensional image data is acquired from a camera.

S407, the host renders the fused three-dimensional point cloud data with the color information to obtain a rendering result.

The rendering result can be understood as a result obtained by performing three-dimensional reconstruction on the excavator working scene, and the result obtained by the three-dimensional reconstruction includes both the color information of the excavator working scene and the depth information of the geographic position of the excavator working scene.

S408, the host sends the rendering result to the client in a topic communication mode so that the client displays the rendering result to the user.

Optionally, before executing the step S408, a network link for establishing a topic communication mode may be further executed between the host and the client.

For example, the display interface of the rendering result of the client may be as shown in fig. 2 above. The user can also click a rendering result button of the display interface of the application program 3 to obtain rendering results of three-dimensional reconstruction of the excavator working scene in different directions and viewing angles, different sizes, or different positions.

It should be understood that the above-described method is only illustrative and does not constitute any limitation on the multi-sensor data fusion method provided by the embodiments of the present application. In the implementation manner, the mapping relationship is determined by using the coordinate representation of 8 angular points of the target calibration plate (i.e. 4 angular points of the target calibration plate 1 and 4 angular points of the target calibration plate 2) in the camera coordinate system and the coordinate representation of 8 angular points of the target calibration plate in the laser radar coordinate system. Optionally, in another implementation manner, the mapping relationship may also be determined by using a coordinate representation of a greater number of corner points of a greater number of calibration plates in the camera coordinate system and a coordinate representation of a greater number of corner points of the greater number of calibration plates in the camera coordinate system in the lidar coordinate system.

In the embodiment of the application, the laser radar and the camera are arranged at the top of the cab of the excavator and used for acquiring three-dimensional point cloud data and two-dimensional image data of a working scene of the excavator. The collected three-dimensional point cloud data and the two-dimensional image data can be transmitted to a host computer, the host computer can fuse the three-dimensional point cloud data and the two-dimensional image data according to external parameters (namely R and t) calibrated in advance and an internal parameter matrix A of a camera to obtain fused data, and the fused data are the three-dimensional point cloud data with color information. The host renders the fused data, transmits rendered results to the client through cloud information, and an operator can accurately judge the working site environment of the excavator through a display picture of the client and control the excavator to operate through a command of the client. When the host machine determines the external parameters calibrated in advance, the host machine directly calculates the external parameters (namely, the rotation matrix R and the translation vector t) between the laser radar and the camera by acquiring the coordinates of the calibration plate in the laser radar coordinate system and the coordinates of the calibration plate in the camera coordinate system. In this implementation, accumulated errors due to performing multiple conversions between different coordinate systems can be avoided, so that the obtained R and t are more accurate. Furthermore, the precision of the obtained data fusion result can be improved, namely the accuracy of environment perception is improved. When the host computer determines the coordinate representation of 4 angular points of each target calibration board (namely, the target calibration board 1 or the target calibration board 2) in the camera coordinate system, the host computer calculates the coordinates of the angular points of each target calibration board in the camera coordinate system according to the preset distances from four edges of each target calibration board to the central point of the calibration board. In summary, the multi-sensor data fusion method provided by the embodiment of the application is easy to implement, and can improve the precision of the data fusion result and the accuracy of environment perception.

Application scenarios and methods for multi-sensor data fusion applicable to the multi-sensor data fusion method provided by the present application are described in detail above with reference to fig. 1 to 6. The multi-sensor data fusion apparatus, device and system provided by the present application will be described with reference to fig. 7 and 9. It should be understood that the above multi-sensor data fusion method corresponds to the following multi-sensor data fusion apparatus, device and system. What is not described in detail below can be referred to the relevant description in the above-described method embodiments.

Fig. 7 is a schematic structural diagram of a multi-sensor data fusion apparatus according to an embodiment of the present application. As shown in fig. 7, the apparatus includes a transceiving unit 701 and a fusion unit 702,

the receiving and sending unit 701 is used for acquiring three-dimensional point cloud data of a target scene acquired by a first sensor; the transceiver unit 701 is further configured to acquire two-dimensional image data of the target scene acquired by a second sensor, where the two-dimensional image data and the three-dimensional point cloud data are data of the target scene at a target time; the fusion unit 702 is configured to: and performing data fusion on the three-dimensional point cloud data and the two-dimensional image data according to a mapping relation to obtain a three-dimensional reconstruction result of the target scene, wherein the mapping relation is the mapping relation from the first sensor coordinate system to the second sensor coordinate system, and the mapping relation is determined according to coordinates of a plurality of corner points of a plurality of calibration plates in the second sensor coordinate system and coordinates of a plurality of corner points of the plurality of calibration plates in the first sensor coordinate system.

In some implementations, the apparatus further includes a computing unit 703, the computing unit 703 being configured to: and calculating the coordinates of the plurality of corner points of the plurality of calibration plates under the first sensor coordinate system and the coordinates of the plurality of corner points of the plurality of calibration plates under the second sensor coordinate system by utilizing an optimization algorithm to obtain the mapping relation, wherein the optimization algorithm is a Singular Value Decomposition (SVD) method or a least square method.

Optionally, in another implementation manner, a plurality of different two-dimensional codes are associated with the plurality of calibration boards, each two-dimensional code covers an associated calibration board plane, and the calculation unit 703 is further configured to: determining coordinates of a plurality of corner points of the calibration plate associated with each two-dimensional code under a world coordinate system according to plane constraints, coordinates of the center point of each two-dimensional code under the world coordinate system and distances from the center point of each two-dimensional code to four sides of the associated calibration plate, wherein the plane constraints are constraints that each two-dimensional code covers the plane of the associated calibration plate; and determining the coordinates of the multiple corner points of the calibration plate associated with each two-dimension code in the second sensor coordinate system according to the coordinates of the multiple corner points of the calibration plate associated with each two-dimension code in the world coordinate system and the position and orientation information of the calibration plate associated with each two-dimension code in the second sensor coordinate system.

Optionally, in other implementations, the computing unit 703 is further configured to: performing the following calculation on each calibration plate in the plurality of calibration plates to obtain coordinates of a plurality of corner points of each calibration plate in the first sensor coordinate system: determining point cloud data of four edges of each calibration plate according to the point cloud data of each calibration plate acquired by the first sensor; performing linear fitting on the point cloud data of the four edges of each calibration plate to obtain a spatial representation of a straight line where the four edges of each calibration plate are located; and calculating the intersection point of the straight line of any two adjacent straight lines in the straight lines of the four sides of each calibration plate in the three-dimensional space to obtain a plurality of intersection points, wherein the plurality of intersection points are coordinates of a plurality of corner points of each calibration plate in the first sensor coordinate system.

Optionally, in another implementation manner, the apparatus further includes a rendering unit 704, and the fusion unit 702 is further configured to: performing data fusion on the three-dimensional point cloud data and the two-dimensional image data according to the internal parameters of the second sensor and the mapping relation to obtain a fusion result; the rendering unit 704 is configured to: rendering the fusion result to obtain a three-dimensional reconstruction result of the target scene.

Optionally, in another implementation manner, the mapping relationship is represented by a rotation matrix and a translation vector, and the calculating unit 703 is further configured to: selecting any one point included in the three-dimensional point cloud data with the mapping relation and a pixel point included in the two-dimensional image data according to the internal parameters of the second sensor, the rotation matrix and the translation vector; the fusion unit 702 is further configured to: and performing data fusion on any point included in the three-dimensional point cloud data with the mapping relation and a pixel point included in the two-dimensional image data to obtain a fusion result.

Optionally, in another implementation manner, the transceiver 701 is further configured to: and sending the three-dimensional reconstruction result of the target scene to a client for displaying the three-dimensional reconstruction result of the target scene at the client.

Optionally, in another implementation manner, the first sensor is a laser radar, the second sensor is a camera, the target scene is an excavator working scene, the laser radar and the camera are arranged at the top of an excavator cockpit, and a distance between the laser radar and the camera is smaller than a preset threshold.

Fig. 8 is a schematic structural diagram of a device for multi-sensor data fusion according to an embodiment of the present application. As shown in fig. 8, includes a memory 801, a processor 802, a communication interface 803, and a communication bus 804. The memory 801, the processor 802, and the communication interface 803 are communicatively connected to each other via a communication bus 804.

The memory 801 may be a Read Only Memory (ROM), a static memory device, a dynamic memory device, or a Random Access Memory (RAM). The memory 801 may store programs, and when the programs stored in the memory 801 are executed by the processor 802, the processor 802 and the communication interface 803 are used to perform the steps of the method of multi-sensor data fusion of the embodiments of the present application.

The processor 802 may be a general Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU), or one or more integrated circuits, and is configured to execute related programs to implement functions required to be executed by units in the multi-sensor data fusion apparatus according to the embodiment of the present application, or to execute steps of the multi-sensor data fusion method according to the embodiment of the present application.

The processor 802 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the multi-sensor data fusion method provided herein may be implemented by integrated logic circuits of hardware or instructions in the form of software in the processor 802. The processor 802 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 801, and the processor 802 reads information in the memory 801, and in combination with hardware thereof, performs functions required to be performed by units included in the multi-sensor data fusion apparatus according to the embodiment of the present application, or performs the multi-sensor data fusion method according to the embodiment of the present application.

The communication interface 803 enables communication between the device shown in fig. 8 and other devices or communication networks using transceiving means such as, but not limited to, transceivers. For example, three-dimensional point cloud data of the target scene acquired by the first sensor may be acquired through the communication interface 803.

The communication bus 804 may include a path that transfers information between the various components of the device shown in fig. 8 (e.g., the memory 801, the processor 802, the communication interface 803).

Fig. 9 is a schematic structural diagram of a multi-sensor data fusion system according to an embodiment of the present application. As shown in fig. 9, the system includes a client 901, a computing device 902, and an acquisition entity 903.

In some implementations, the client 901 may be the client 101 shown in fig. 1, the computing device 902 may be the computing device 103 shown in fig. 1, the collection entity 903 may be the collection cart 105 shown in fig. 1, the first sensor may be the collection cart 106 shown in fig. 1, and the second sensor may be the collection cart 107 shown in fig. 1.

Wherein the computing device 902 is configured to perform the multi-sensor data fusion method described above to obtain a three-dimensional reconstruction of the target scene; the first sensor arranged on the acquisition entity 903 is used for acquiring three-dimensional point cloud data of the target scene, and the second sensor arranged on the acquisition entity 903 is used for acquiring two-dimensional image data of the target scene; the client 901 is configured to obtain a three-dimensional reconstruction result of the target scene from the computing device 902; in response to a selection of a user on the display interface of the client 901, the display interface of the client 901 displays a three-dimensional reconstruction result of a target scene corresponding to the selection of the user. The embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium includes computer instructions, and the computer instructions, when executed by a processor, are used to implement any one of the technical solutions of the method for multi-sensor data fusion in the embodiment of the present application.

An embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program is configured to implement the method for multi-sensor data fusion according to any one of the above technical solutions.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored on a computer-readable medium and include several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device) to execute the method according to the embodiments of the present disclosure.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage media, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

2. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the appended claims.

Claims

1. A method of multi-sensor data fusion, comprising:

acquiring three-dimensional point cloud data of a target scene acquired by a first sensor;

acquiring two-dimensional image data of the target scene acquired by a second sensor, wherein the two-dimensional image data and the three-dimensional point cloud data are data of the target scene at a target moment;

and performing data fusion on the three-dimensional point cloud data and the two-dimensional image data according to a mapping relation to obtain a three-dimensional reconstruction result of the target scene, wherein the mapping relation is the mapping relation from the first sensor coordinate system to the second sensor coordinate system, and the mapping relation is determined according to coordinates of a plurality of corner points of a plurality of calibration plates in the second sensor coordinate system and coordinates of a plurality of corner points of the plurality of calibration plates in the first sensor coordinate system.

2. The method of claim 1, further comprising:

and calculating the coordinates of the plurality of corner points of the plurality of calibration plates under the first sensor coordinate system and the coordinates of the plurality of corner points of the plurality of calibration plates under the second sensor coordinate system by utilizing an optimization algorithm to obtain the mapping relation, wherein the optimization algorithm is a Singular Value Decomposition (SVD) method or a least square method.

3. The method of claim 1 or 2, wherein a plurality of different two-dimensional codes are associated with the plurality of calibration plates, each two-dimensional code overlying an associated calibration plate plane, the method further comprising:

determining coordinates of a plurality of corner points of the calibration plate associated with each two-dimensional code under a world coordinate system according to plane constraints, the coordinates of the central point of each two-dimensional code under the world coordinate system and the distances from the central point of each two-dimensional code to four edges of the associated calibration plate, wherein the plane constraints are the constraints that each two-dimensional code covers the plane of the associated calibration plate;

and determining the coordinates of the plurality of corner points of the calibration plate associated with each two-dimension code in the second sensor coordinate system according to the coordinates of the plurality of corner points of the calibration plate associated with each two-dimension code in the world coordinate system and the position and attitude information of the calibration plate associated with each two-dimension code in the second sensor coordinate system.

4. The method according to any one of claims 1 to 3, further comprising:

performing the following calculation on each calibration plate in the plurality of calibration plates to obtain coordinates of a plurality of corner points of each calibration plate in the first sensor coordinate system:

determining point cloud data of four edges of each calibration plate according to the point cloud data of each calibration plate acquired by the first sensor;

performing linear fitting on the point cloud data of the four edges of each calibration plate to obtain a spatial representation of a straight line where the four edges of each calibration plate are located;

and calculating the intersection point of the straight line of any two adjacent straight lines in the straight lines of the four sides of each calibration plate in the three-dimensional space to obtain a plurality of intersection points, wherein the plurality of intersection points are coordinates of a plurality of corner points of each calibration plate in the first sensor coordinate system.

5. The method according to any one of claims 1 to 4, wherein the data fusion of the three-dimensional point cloud data and the two-dimensional image data according to the mapping relationship to obtain a three-dimensional reconstruction result of the target scene comprises:

performing data fusion on the three-dimensional point cloud data and the two-dimensional image data according to the internal parameters of the second sensor and the mapping relation to obtain a fusion result;

rendering the fusion result to obtain a three-dimensional reconstruction result of the target scene.

6. The method of claim 5, wherein the mapping is represented by a rotation matrix and a translation vector,

and performing data fusion on the three-dimensional point cloud data and the two-dimensional image data according to the internal parameters of the second sensor and the mapping relation to obtain a fusion result, wherein the fusion result comprises the following steps:

selecting any one point included in the three-dimensional point cloud data with a mapping relation and a pixel point included in the two-dimensional image data according to the internal parameters of the second sensor, the rotation matrix and the translation vector;

and performing data fusion on any point included in the three-dimensional point cloud data with the mapping relation and a pixel point included in the two-dimensional image data to obtain a fusion result.

7. The method according to any one of claims 1 to 6, further comprising:

and sending the three-dimensional reconstruction result of the target scene to a client for displaying the three-dimensional reconstruction result of the target scene at the client.

8. The method of any one of claims 1 to 7, wherein the first sensor is a lidar and the second sensor is a camera,

the target scene is an excavator working scene, the laser radar and the camera are arranged at the top of an excavator cab, and the distance between the laser radar and the camera is smaller than a preset threshold value.

9. An apparatus for multi-sensor data fusion, comprising:

the acquisition unit is used for acquiring three-dimensional point cloud data of a target scene acquired by the first sensor;

the acquisition unit is further configured to acquire two-dimensional image data of the target scene acquired by a second sensor, where the two-dimensional image data and the three-dimensional point cloud data are data of the target scene at a target time;

the fusion unit is used for: and performing data fusion on the three-dimensional point cloud data and the two-dimensional image data according to a mapping relation to obtain a three-dimensional reconstruction result of the target scene, wherein the mapping relation is the mapping relation from the first sensor coordinate system to the second sensor coordinate system, and the mapping relation is determined according to coordinates of a plurality of corner points of a plurality of calibration plates in the second sensor coordinate system and coordinates of a plurality of corner points of the plurality of calibration plates in the first sensor coordinate system.

10. A multi-sensor data fusion device, comprising: a memory and a processor, the memory and the processor coupled;

the memory is to store one or more computer instructions;

the processor is configured to execute the one or more computer instructions to implement the method of any of claims 1 to 8.

11. A system for multi-sensor data fusion, comprising: a computing device, a collection entity, and a client,

the computing device is configured to perform the method of any one of claims 1 to 8 to obtain a three-dimensional reconstruction of the target scene;

the first sensor arranged on the acquisition entity is used for acquiring three-dimensional point cloud data of the target scene, and the second sensor arranged on the acquisition entity is used for acquiring two-dimensional image data of the target scene;

the client is used for acquiring a three-dimensional reconstruction result of the target scene from the computing equipment;

responding to the selection of the user on the display interface of the client, and displaying the three-dimensional reconstruction result of the target scene corresponding to the selection of the user on the display interface of the client.

12. A computer readable storage medium having stored thereon one or more computer instructions for execution by a processor to perform the method of any one of claims 1 to 8.