WO2022135594A1

WO2022135594A1 - Method and apparatus for detecting target object, fusion processing unit, and medium

Info

Publication number: WO2022135594A1
Application number: PCT/CN2021/141370
Authority: WO
Inventors: 吴臻志; 杨哲宇; 马欣; 祝夭龙
Original assignee: 北京灵汐科技有限公司
Priority date: 2020-12-25
Filing date: 2021-12-24
Publication date: 2022-06-30

Abstract

A method for detecting a moving object. The method comprises: acquiring event data, wherein the event data represents light intensity change information in a target plane, and the light intensity change information is used for determining at least one target object in the target plane (S100); acquiring radar detection data for the target object, wherein the radar detection data is information for describing the motion state of the target object (S200); and performing fusion processing on the event data and the radar detection data, so as to generate multi-dimensional motion state information of the target object (S300). Provided are an apparatus for detecting a target object, a fusion processing unit, and a computer-readable medium.

Description

Target object detection method and device, fusion processing unit, medium

technical field

The present disclosure relates to the technical field of target detection, and in particular, to a target object detection method, a target object detection device, a fusion processing unit, and a computer-readable medium.

Background technique

Object detection is an important technology for video image analysis and understanding, and an important preprocessing step for some computational vision tasks, such as object recognition and moving object tracking.

Inter-frame difference method and optical flow method are common methods for detecting moving objects in some related technologies. The inter-frame difference method obtains the outline of the moving target by calculating the difference between two adjacent frames of images, specifically, subtracting the two frames of images to obtain the absolute value of the brightness difference between the two frames of images, and by judging whether the calculated absolute value is greater than the threshold value. Analyze the motion characteristics of video or image sequences. The law of optical flow is to describe the motion of the observation target, surface or edge caused by the motion of the observer relative to the motion of the observer.

In some related technologies, moving objects are detected based on temporal perception, but the spatial perception ability is weak, and multi-dimensional visual perception cannot be performed on moving objects and/or stationary objects at the same time.

In some related application scenarios, the detection effect of the moving object detection algorithm is not ideal.

SUMMARY OF THE INVENTION

The present disclosure provides a method for detecting a target object, a device for detecting a target object, a fusion processing unit, and a computer-readable medium.

In a first aspect, an embodiment of the present disclosure provides a method for detecting a target object, where the method includes:

The event data represents light intensity change information in the target plane, and the light intensity change information is used to determine at least one target object in the target plane;

acquiring radar detection data for the target object, where the radar detection data is information describing the motion state of the target object;

The event data and the radar detection data are fused to generate multi-dimensional motion state information of the target object.

In a second aspect, an embodiment of the present disclosure provides a detection device for a target object, the detection device includes:

a first sensor for detecting light intensity change information in the target plane to generate event data, the light intensity change information being used to determine at least one target object in the target plane;

a radar, for acquiring radar detection data for the target object, where the radar detection data is information describing the motion state of the target object;

The fusion processing unit is configured to perform fusion processing on the event data and the radar detection data to generate multi-dimensional motion state information of the target object.

In a third aspect, an embodiment of the present disclosure provides a fusion processing unit, which is applied to a target object detection device, and the fusion processing unit includes:

one or more processors;

A storage device on which one or more programs are stored, when the one or more programs are executed by the one or more processors, the one or more processors implement the first aspect of the embodiments of the present disclosure; The detection method of the target object described.

In a fourth aspect, an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, implements the method for detecting a target object described in the first aspect of the embodiment of the present disclosure.

In the embodiment of the present disclosure, the event data representing the light intensity change information in the target plane and the radar detection data for the target object are fused to realize the collection of multi-dimensional motion state information of the target object, so that at least one The stationary or moving target object can be clearly judged by motion, so that the detection device has biological-like vision, and the biological-like visual perception of the stationary or moving target object is realized, and the collected event data is detected by the motion-sensitive sensor. Real-time dynamic response is generated, which effectively reduces the influence of redundant data on the detection effect, realizes the real-time dynamic response of target object detection, and effectively improves the efficiency of moving object detection.

It should be understood that what is described in this section is not intended to identify key or critical features of embodiments of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become readily understood from the following description.

Description of drawings

The accompanying drawings are used to provide a further understanding of the present disclosure and constitute a part of the specification, and together with the embodiments of the present disclosure, they are used to explain the present disclosure, and are not intended to limit the present disclosure. The above and other features and advantages will become more apparent to those skilled in the art by describing detailed example embodiments with reference to the accompanying drawings, in which:

1 is a schematic flowchart of a detection method in an embodiment of the present disclosure;

2 is a schematic flowchart of some steps in a detection method according to an embodiment of the present disclosure;

3 is a schematic flowchart of some steps in a detection method according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of an implementation manner of determining an offset in an embodiment of the present disclosure;

5 is a schematic flowchart of some steps in a detection method according to an embodiment of the present disclosure;

6 is a schematic flowchart of some steps in a detection method according to an embodiment of the present disclosure;

7 is a schematic flowchart of some steps in a detection method according to an embodiment of the present disclosure;

8 is a schematic flowchart of some steps in a detection method according to an embodiment of the present disclosure;

9 is a schematic flowchart of some steps in a detection method according to an embodiment of the present disclosure;

10 is a block diagram of the composition of a detection device in an embodiment of the present disclosure;

11 is a block diagram of another detection device in an embodiment of the present disclosure;

12 is a block diagram of another detection device in an embodiment of the present disclosure;

FIG. 13 is a block diagram of a fusion processing unit in an embodiment of the present disclosure.

Detailed ways

In order for those skilled in the art to better understand the technical solutions of the present disclosure, the exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, including various details of the embodiments of the present disclosure to facilitate understanding, and they should be considered to be exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.

Various embodiments of the present disclosure and various features of the embodiments may be combined with each other without conflict.

As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

The terminology used herein is used to describe particular embodiments only and is not intended to limit the present disclosure. As used herein, the singular forms "a" and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that when the terms "comprising" and/or "made of" are used in this specification, the stated features, integers, steps, operations, elements and/or components are specified to be present, but not precluded or Add one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Words like "connected" or "connected" are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will also be understood that terms such as those defined in common dictionaries should be construed as having meanings consistent with their meanings in the context of the related art and the present disclosure, and will not be construed as having idealized or over-formal meanings, unless expressly so limited herein.

In a first aspect, referring to FIG. 1 , an embodiment of the present disclosure provides a method for detecting a target object. The detecting method includes steps S100 to S300.

In step S100, event data is acquired, where the event data represents light intensity change information in the target plane, and the light intensity change information is used to determine at least one target object in the target plane.

In step S200, radar detection data for the target object is acquired, where the radar detection data is information describing the motion state of the target object.

In step S300, the event data and the radar detection data are fused to generate multi-dimensional motion state information of the target object.

In an embodiment of the present disclosure, a device for detecting a target object is provided, which is used to detect the motion state of the target object. The detection device of the target object includes a first sensor and a radar. The first sensor in the detection device is sensitive to motion and can respond dynamically to scene changes in real time.

In the embodiment of the present disclosure, in the above step S100, the event data is generated by collecting the light intensity change information in the target plane by the first sensor, wherein the first sensor is a sensor imitating the working mechanism of biological vision, A sensor only retains dynamic information when generating event data. The target plane may represent the image captured by the first sensor for the target scene, and the first sensor may generate event data by sensing the light intensity change information of each pixel in the captured image of the target scene, and the event data may be stored in the target plane. The information composition of the pixels where the light intensity changes. Exemplarily, the first sensor is a Dynamic Vision Sensor (DVS, Dynamic Vision Sensor).

In the embodiment of the present disclosure, in step S200, the motion state of the target object is detected by the radar to generate radar detection data.

In this embodiment of the present disclosure, there is no special restriction on the execution sequence of step S100 and step S200, and step S100 and step S200 may be executed at the same time to obtain event data and radar detection data corresponding to the same time point; Steps S100 and S200 are executed to obtain event data corresponding to multiple time points and radar detection data corresponding to multiple time points.

In the embodiment of the present disclosure, in step S300, the fusion processing of event data and radar detection data is performed by aligning and calibrating event data and radar detection data corresponding to the same time point to generate multi-dimensional motion state information of the target object.

This embodiment of the present disclosure does not specifically limit the multi-dimensional motion state information of the target object generated in step S300. In some embodiments, the multi-dimensional motion state information may include three-dimensional velocity and three-dimensional position coordinates of the target object at various time points, and may also include velocity components of the target object in multiple predetermined directions including the first direction at each time point, and further It may include the movement trajectory of the target object generated according to the three-dimensional velocity and three-dimensional position coordinates of multiple time points, and the like. In some embodiments, the multi-dimensional motion state information may include at least one of the distance (distance between the target object and the detection device), orientation, height, speed, attitude, shape and other information of the target object.

As an optional implementation manner, the above-mentioned radar for collecting radar detection data may be a pulse Doppler radar. The pulse Doppler radar detects the target object by transmitting a pulse signal to the target object and receiving the pulse signal reflected by the target object. The movement state of the radar can be obtained to obtain radar detection data. In this case, the radar detection data may include first motion state component information of the target object in a first direction, and the first direction is perpendicular to the target plane.

This embodiment of the present disclosure does not specifically limit the first motion state component information. For example, the first motion state component information may include the velocity component of the target object in the first direction, the distance from the target object to the detection device, and the like.

It should be noted that, in some embodiments, the first direction is a direction parallel to the radial direction of the radar, and the target plane is a plane perpendicular to the radial direction of the radar.

In some embodiments, the motion state component information of the target object on the target plane can be determined according to the event data, and the multi-dimensional motion of the target object can be determined according to the motion state component information of the target object on the target plane and the motion state component information in the first direction status information.

Correspondingly, in some embodiments, the radar detection data includes first motion state component information of the target object in the first direction. Referring to FIG. 2 , step S300 may further include steps S310A-step S320A.

In step S310A, the second motion state component information of the target object on the target plane is determined according to the event data.

In step S320A, multi-dimensional motion state information of the target object is generated according to the second motion state component information and the first motion state component information.

It should be noted that, in some embodiments, the target object may be detected by radar to generate the first motion state component information; in other embodiments, the target object may also be detected by radar to generate the initial detection signal, for example, The Doppler frequency shift corresponding to the target object measured by the pulse Doppler radar, and then the first motion state component information is determined according to the initial detection information. This embodiment of the present disclosure makes no special limitation on this.

In the scenario where the target object is detected by the radar to generate the initial detection signal, the detection method may further include the step of determining the first motion state component information according to the initial detection signal.

The embodiment of the present disclosure does not make any special limitation on the first sensor in the detection device. As an optional implementation manner, the first sensor in the detection device may be a Dynamic Vision Sensor (DVS, Dynamic Vision Sensor). DVS is a sensor that imitates the working mechanism of biological vision. It can detect the change of light and output the address and information of the pixel where the light intensity changes. It eliminates redundant data and can dynamically respond to scene changes in real time.

In some embodiments of the present disclosure, the event data collected by the DVS is the two-dimensional data of the target plane. Determine the target object in the target plane, and determine the motion state component information of the target object in the target plane.

In some embodiments of the present disclosure, the DVS does not need to read all the pixels in the picture, but only needs to obtain the address and information of the pixels whose light intensity changes; specifically, when the DVS detects the light of a certain pixel When the intensity change is greater than or equal to the preset threshold value, the event signal of the pixel is sent; if the change of the light intensity is a positive change, that is, the pixel jumps from low brightness to high brightness, it will send out with "+1" If the light intensity change is a negative change, that is, the pixel jumps from high brightness to low brightness, an event signal represented by "-1" is sent and marked as negative. Event; if the change of light intensity is less than the preset threshold value, no event signal will be sent, and it will be marked as no event; DVS constitutes event data by marking the event of each pixel where the light intensity changes. Among them, in the event data, both positive events and negative events can be used to represent the light intensity change information of the pixel point.

Correspondingly, in some embodiments, the above-mentioned event data may include the coordinates of the pixels where the light intensity changes in the target plane and the light intensity change information, and may further include time information, and the time information may indicate that the light intensity changes. time; the second motion state component information includes the position coordinates of the target object in the target plane; referring to FIG. 3 , step S310A may further include steps S311A and S312A.

In step S311A, an event frame is generated according to the coordinates and light intensity change information of each pixel in the event data within a preset sampling period.

In step S312A, the position coordinates of the target object in the target plane are determined according to the event frame.

In the embodiment of the present disclosure, in step S311A, the event data sampled in the preset sampling period is framed to generate an event frame, and the event frame can represent the data generated for each pixel in the preset sampling period. Image frame displayed after summing all events, such as positive or negative events. Exemplarily, the output data of the DVS is event data consisting of a plurality of 4-tuple data, each 4-tuple data corresponds to a pixel whose light intensity changes in the target plane, and the 4-tuple data includes pixels whose light intensity changes. The coordinates of the point in the target plane (abscissa x, ordinate y), light intensity change information, and time information. In step S311A, according to the time information carried by the 4-tuple data, the 4-tuple data corresponding to the same time point is framed, thereby generating a corresponding event frame.

In this embodiment of the present disclosure, there is no special limitation on how to implement step S312A. As an optional implementation manner, a target detection algorithm may be used to determine the position coordinates of the target object in the target plane according to the event frame. For example, when the target object in the picture moves relatively, the light intensity of the corresponding pixel will change to varying degrees. For example, when the target object appears, the light intensity of the pixel in the area where the target object appears will increase significantly, and when the target object disappears , the brightness of the pixels in the disappearing area of the target object will be significantly reduced. Therefore, according to the position coordinates of the pixels where the light intensity changes, it can be determined which pixels in the picture may have the target object, and the contour area of the target object can be determined, and then Obtain the location area of the target object, and determine the location coordinates of the target object.

In some embodiments of the present disclosure, the coordinates of any point on the target object in the target plane may be used as the position coordinates of the target object in the target plane. For example, the coordinates of the center point of the target object may be used as the coordinates of the target object in the target plane. location coordinates in . This embodiment of the present disclosure makes no special limitation on this.

In some embodiments of the present disclosure, the position coordinates of the target object in the target plane may be represented by (x, y). Wherein, x corresponds to one of the second direction and the third direction, y corresponds to the other of the second direction and the third direction, the second direction and the third direction are both parallel to the target plane, and the second direction and the third direction are parallel to the target plane. Three directions are vertical.

Correspondingly, in some embodiments, the second motion state component information further includes the second velocity component of the target object in the second direction and the third velocity component in the third direction; referring to FIG. 3 , step S310A further includes It may further include: step S313A-step S315A.

In step S313A, the second offset of the target object in the second direction and the third offset of the target object in the third direction are respectively determined according to the position coordinates of the target object in the target plane.

In step S314A, a second velocity component of the target object in the second direction is determined according to the second offset.

In step S315A, the third velocity component of the target object in the third direction is determined according to the third offset.

In some embodiments of the present disclosure, the second direction and the third direction in the target plane are not particularly limited. As an optional implementation manner, the first direction, the second direction, and the third direction constitute a three-dimensional rectangular coordinate system, the first direction is the ordinate direction of the three-dimensional rectangular coordinate system, and the second direction is the abscissa direction of the three-dimensional rectangular coordinate system direction, the third direction is the vertical coordinate direction of the three-dimensional Cartesian coordinate system.

This embodiment of the present disclosure does not specifically limit how to perform step S313A to determine the second offset and the third offset. FIG. 4 shows an optional implementation manner of determining the third offset in the third direction according to the coordinates of the pixel points where the light intensity changes in the target plane at adjacent time points. As shown in Figure 4, the absolute offset of the target object in the third direction can be obtained. Similarly, the absolute offset of the target object in the second direction can be obtained.

It should also be noted that, in step S314A and step S315A, when determining the second velocity component and the third velocity component, the time difference between adjacent time points is also combined.

In some embodiments, the first motion state component information includes a first velocity component of the target object in a first direction and a distance parameter, where the distance parameter represents a distance between the target object and the detection device; the multi-dimensional motion state information includes a three-dimensional Speed and three-dimensional coordinates; referring to FIG. 3 , step S320A may further include: step S321A and step S322A.

In step S321A, the position coordinates of the target object in the target plane, the first velocity component, the second velocity component, the third velocity component and the distance parameter of the target object in the first direction corresponding to the same time point are determined.

In step S322A, the target object is determined according to the position coordinates of the target object corresponding to the same time point in the target plane, the first velocity component, the second velocity component, the third velocity component and the distance parameter of the target object in the first direction The 3D velocity and the 3D coordinates.

In some embodiments of the present disclosure, the three-dimensional coordinates represent the position coordinates of the target object in the three-dimensional space, and the three-dimensional velocity represents the relative movement speed of the target object in the three-dimensional space. The three-dimensional coordinates can be represented by (x, y, z). Wherein, z corresponds to the first direction, x corresponds to one of the second direction and the third direction, and y corresponds to the other of the second direction and the third direction.

As an optional implementation manner, as described above, in step S100, event data is acquired through the DVS. Correspondingly, in some embodiments, referring to FIG. 5 , step S100 includes: in step S110 , acquiring event data in response to changes in the light intensity of pixels in the target plane, where the event data includes pixels whose light intensity changes in the target plane coordinates and light intensity change information.

The event data may further include time information, and the time information may represent the time when the light intensity of the pixel changes.

As an optional implementation manner, the information of the first motion state component is obtained through a pulse Doppler radar, and the pulse Doppler radar can detect information such as the speed and distance of the target object. Correspondingly, in some embodiments, referring to FIG. 5 , step S200 may further include: in step S210 , acquiring the first velocity component and distance parameter of the target object in the first direction as the first motion state component information.

In step S300, the event data and the first motion state component information are fused to generate multi-dimensional motion state information of the target object.

In some embodiments of the present disclosure, the event data representing the light intensity change information in the target plane and the first motion state component information of the target object in the first direction detected by the pulse Doppler radar are fused to achieve The multi-dimensional motion state information of the target object is collected, in which the event data is generated by the real-time dynamic response of the scene change by the motion-sensitive sensor, which effectively reduces the influence of redundant data on the detection effect and realizes the detection of the target object. Real-time dynamic response effectively improves the efficiency of moving object detection.

In some embodiments, referring to FIG. 5 , the method for detecting a target object may further include: in step S400 , outputting multi-dimensional motion state information.

This embodiment of the present disclosure does not specifically limit the multi-dimensional motion state information. In some embodiments, the multi-dimensional motion state information may include the three-dimensional velocity and three-dimensional position coordinates of the target object at various time points, and may also include the velocity components of the target object in the first direction, the second direction, and the third direction at each time point , and may also include a motion trajectory of the target object generated according to the three-dimensional velocity and three-dimensional position coordinates of multiple time points. In some embodiments, the multi-dimensional motion state information may include at least one of the distance (distance between the target object and the detection device), orientation, height, speed, attitude, shape and other information of the target object.

This embodiment of the present disclosure does not specifically limit how to output the multi-dimensional motion state information. For example, the multi-dimensional motion state information of the target object can be displayed on the display screen.

As an optional implementation manner, the above-mentioned radar for collecting radar detection data may be a lidar (Lidar), a laser radar, and the collected radar detection data is laser point cloud data; A set of vectors in a three-dimensional coordinate system. The vectors in the set can be represented in the form of X, Y, and Z three-dimensional coordinates. Lidar is a radar system that emits laser beams to detect the position, velocity and other characteristic quantities of target objects. Lidar generates laser point cloud data by scanning, which can characterize the motion state of at least one target object such as distance, azimuth, height, speed, attitude, and shape.

In some embodiments, the lidar can detect moving objects as well as stationary objects. Therefore, in the embodiment of the present disclosure, the target object may be a moving object or a stationary object. This embodiment of the present disclosure makes no special limitation on this.

In some embodiments, in the case where the radar is a lidar, the multi-dimensional motion state information may include at least one of the distance (distance between the target object and the detection device), azimuth, altitude, speed, attitude, and shape of the target object. By.

In some embodiments of the present disclosure, the multi-dimensional motion state information of the target object is generated by acquiring laser point cloud data and event data representing light intensity change information in the target plane, and fusing the laser point cloud data and the event data. , so that at least one stationary or moving target object can be clearly judged on the motion, and the biological-like visual perception of the stationary or moving target object is realized.

This embodiment of the present disclosure does not specifically limit how to perform step S300 to perform fusion processing on event data and laser point cloud data. As an optional implementation manner, a neural network (such as a convolutional neural network) is used to perform fusion processing on event data and laser point cloud data. In some embodiments, the input of the neural network is a three-dimensional image and an event frame, wherein the three-dimensional image is generated according to laser point cloud data, and the event frame is generated by framing according to the acquired event data. Correspondingly, referring to FIG. 6 , step S300 may further include: steps S310B to S330B.

In step S310B, a three-dimensional image is generated according to the laser point cloud data.

Among them, the point cloud data includes the distance information between the point corresponding to the reflected signal in all the emitted laser signals and the emission source (ie, the laser radar). The corresponding three-dimensional image can be obtained by transforming the spatial position information from spherical coordinates to XYZ three-dimensional coordinates, and the generated three-dimensional image may refer to a three-dimensional point cloud image.

In step S320B, an event frame is generated according to the event data within the preset sampling period.

For the description of this step S320B, reference may be made to the above description of step S311A, and details are not repeated here.

In step S330B, the three-dimensional image and the event frame are input into the neural network for processing to generate multi-dimensional motion state information of the target object.

In some embodiments, in step S330B, the three-dimensional image is projected onto the same two-dimensional plane as the event frame, and the two images are stitched together in the channel dimension and then input to a neural network (eg, a convolutional neural network) for Feature extraction, so as to obtain the multi-dimensional motion state information of the target object.

In some embodiments, the inputs to a neural network (eg, a convolutional neural network) are two-dimensional images and event frames. Among them, the two-dimensional image is the top view and front view obtained by projecting the three-dimensional laser point cloud data along the front-view direction and the top-view direction respectively, so as to obtain the two-dimensional image representation of the three-dimensional laser point cloud data; the event frame is the preset sampling period. The event data is framed and fed into the neural network frame by frame. Correspondingly, in some embodiments, referring to FIG. 7 , step S300 may further include steps S310C to S330C.

In step S310C, the laser point cloud data is processed to generate a front view and a top view of the laser point cloud data.

It can be understood that the point cloud data is represented by three-dimensional coordinates. By performing projection mapping on the point cloud data in a certain direction, the three-dimensional coordinates of the point cloud data can be transformed into two-dimensional coordinates to obtain the corresponding projected view. The data is projected and mapped along the front-view and top-view directions, and the corresponding front-view and top-view views can be obtained respectively.

In step S320C, an event frame is generated according to the event data in the preset sampling period.

For the description of this step S320C, reference may be made to the above description of step S311A, and details are not repeated here.

In step S330C, the front view, the top view, and the event frame are input into the neural network for processing to generate multi-dimensional motion state information of the target object.

In some embodiments, the front view, the top view and the event frame may be spliced together in the channel dimension and then input to the neural network for feature extraction, so as to obtain the multi-dimensional motion state information of the target object.

The embodiment of the present disclosure also provides a non-neural network processing method for fusion processing of laser point cloud data and event data. In some embodiments, referring to FIG. 8 , step S300 may further include steps S310D-S320D.

In step S310D, at least one target area is determined according to the event data, first coordinate information of the at least one target area is obtained, and each target area corresponds to a target object.

In step S320D, the second coordinate information of the target area in the laser point cloud data is determined according to the first coordinate information, and multi-dimensional motion state information of the target object is generated.

In step S310D, a target detection algorithm may be used to detect a target object in at least one target area in the event frame generated according to the event data, so as to determine the first coordinate information of each target area in the event frame.

In some embodiments, in step S320D, through the coordinate change algorithm in three-dimensional space, the laser point cloud data and the images of different angles in the event frame can be transformed to the same angle, and then the points in the images can be correlated, thereby Determine the multi-dimensional motion state information of the target object.

In some embodiments, the motion state information of the target object on two different planes can also be obtained through two DVSs installed in different positions, and then the multi-dimensional motion state information of the target object can be uniquely determined through calculation.

In some embodiments, the target object can also be detected by one or more image sensors, and the signals generated by the one or more image sensors can be fused with event data and laser point cloud data to form a multi-dimensional image of the target object. perception. As an optional implementation manner, the image sensor is a Complementary Metal Oxide Semiconductor (CMOS, Complementary Metal Oxide Semiconductor) sensor. Correspondingly, referring to FIG. 9 , the target detection method further includes: in step S500 , acquiring at least one channel of RGB image signals. In this case, step S300 may further include: in step S310E, fusing at least one channel of RGB image signal with laser point cloud data and event data to generate multi-dimensional motion information of the target object.

In some embodiments of the present disclosure, the neural network may include a plurality of processing branches, each processing branch correspondingly processes a channel of RGB image signals, and the RGB image signals of different channels are image signals collected by different RGB image sensors. The RGB image signals are input to the neural network through corresponding processing branches. In some embodiments of the present disclosure, the RGB image signal is input to the neural network frame by frame.

In some embodiments of the present disclosure, RGB image signals, laser point cloud data, and event frames captured from different angles can be registered through key point detection (the images captured at different angles and different fields of view are The operation becomes a picture of the same angle and the same field of view), and the registered three sets of pictures are spliced together in the channel dimension and then input to the network for feature extraction to obtain the multi-dimensional motion information of the target object.

In addition, the above RGB image signals, laser point cloud data and event frames can also be input into a neural network with multiple inputs, and the network outputs the motion state information of the target object such as speed, position, etc. The neural network needs to pass the marked The training samples are obtained by network training.

In a second aspect, an embodiment of the present disclosure provides a detection device for a target object. The detection device can be used to implement the above detection method. Referring to FIG.

The first sensor 101 is used to detect light intensity change information in the target plane to generate event data, and the light intensity change information is used to determine at least one target object in the target plane.

The radar 102 is configured to acquire radar detection data for the target object, where the radar detection data is information describing the motion state of the target object.

The fusion processing unit 103 is configured to perform fusion processing on event data and radar detection data to generate multi-dimensional motion state information of the target object.

The detection device provided by the embodiment of the present disclosure can be applied to automatic driving, and is used to detect a target object.

In the embodiment of the present disclosure, the fusion processing unit 103 can execute the method for detecting a target object described in the first aspect of the embodiment of the present disclosure, and fuse the event data with the radar detection data to generate multi-dimensional motion state information of the target object.

The embodiment of the present disclosure does not specifically limit the first sensor 101 in the detection device. As an optional implementation manner, the first sensor 101 in the detection device may be a dynamic vision sensor (DVS, Dynamic Vision Sensor). DVS is a sensor that imitates the working mechanism of biological vision. It can detect the change of light and output the address (position coordinates) and information of the pixel where the light intensity changes, effectively reduce redundant data, and can dynamically respond to scene changes in real time.

Correspondingly, in some embodiments, the first sensor 101 is a dynamic vision sensor; the dynamic vision sensor is used to detect changes in the light intensity of each pixel in the target plane, and generate event data, where the event data includes the target The coordinates and light intensity change information of the pixel points where the light intensity changes in the plane may further include time information.

As an optional implementation manner, the radar 102 in the detection device may be a pulse Doppler radar, and the radar detection data includes first motion state component information of the target object in a first direction, and the first direction is perpendicular to the target plane. The pulse Doppler radar is used for sending and receiving pulse signals to determine the first motion state component information of the target object in the first direction; the fusion processing unit 103 can obtain the first motion state component information from the pulse Doppler radar.

The following is a brief introduction to pulse Doppler radar.

Doppler radar refers to the radar that uses the Doppler effect to measure the radial velocity component of the target relative to the radar, or to extract the radial direction of the target with a specific radial velocity. A pulsed Doppler radar is a Doppler radar that transmits pulsed signals.

When the pulse Doppler radar scans the air with a pulse wave at a fixed frequency, if it encounters a moving target, the frequency of the reflected echo and the frequency of the transmitted wave will have a frequency difference, that is, the Doppler frequency shift. The Doppler shift is proportional to the relative radial velocity of the moving target and the radar. From the magnitude of the Doppler shift, the radial velocity of the moving target can be determined. The magnitude of the Doppler shift is calculated from the phase of the signal. Therefore, in the embodiment of the present disclosure, the radar 102 is a coherent radar, so that phase information can be preserved.

It should also be noted that, for a target object, when the target object moves toward the radar, the Doppler frequency shift is positive; when the target object moves away from the radar, the Doppler frequency shift is negative.

In some embodiments, referring to FIG. 10 , the detection apparatus further includes: an output unit 104 for outputting multi-dimensional motion state information of the target object.

The embodiment of the present disclosure does not impose special restrictions on the output unit. For example, the output unit is a display screen, and the multi-dimensional motion state information of the target object is displayed on the display screen.

In some embodiments of the present disclosure, the fusion processing unit 103 is configured to: determine the second motion state component information of the target object on the target plane according to the event data; according to the second motion state component information and the first motion state component information, Generate multi-dimensional motion state information of the target object.

In some embodiments of the present disclosure, the radar 102 is a laser radar, and the radar detection data is laser point cloud data; the laser radar is used for emitting a laser beam to detect at least one target object to generate laser point cloud data.

In some embodiments, referring to FIG. 11 , the fusion processing unit 103 may include (ISP, Image Signal Processing) 131 and a first neural network 132. Wherein, the first image signal processor 131 is used for: generating a three-dimensional image according to the laser point cloud data; and generating an event frame according to the event data in a preset sampling period. The first neural network 132 is used to process three-dimensional images and event frames to generate multi-dimensional motion state information of the target object.

In some embodiments, referring to FIG. 12 , the fusion processing unit 103 may include a second image signal processor 133 and a second neural network (eg, a convolutional neural network) 134 . Among them, the second image signal processor 133 is used for processing the laser point cloud data to generate the front view and top view of the laser point cloud data; according to the event data in the preset sampling period, the event frame is generated; the second neural network 134 uses It is used to process the front view, top view and event frame to generate multi-dimensional motion state information of the target object.

In some embodiments, the fusion processing unit is configured to: determine at least one target area according to the event data, obtain first coordinate information of the at least one target area, each target area corresponds to a target object; determine the target area according to the first coordinate information The second coordinate information in the laser point cloud data generates multi-dimensional motion state information of the target object.

In some embodiments, referring to FIG. 11 and FIG. 12 , the detection device further includes at least one second sensor 140; the second sensor 140 is used to acquire RGB images and generate RGB image signals; the fusion processing unit 103 is used to combine at least one RGB image The image signal, laser point cloud data and event data are fused to generate multi-dimensional motion state information of the target object.

In some embodiments, referring to FIG. 11 , the first neural network 132 may include a plurality of processing branches, each processing branch corresponds to a second sensor 140 , and the RGB image signals output by the second sensor 140 pass through the corresponding processing branch Input to the first neural network 132 .

In some embodiments, referring to FIG. 12 , the second neural network 134 includes a plurality of processing branches, each processing branch corresponds to a second sensor 140 , and the RGB image signals output by the second sensor 140 are input through the corresponding processing branch The second neural network 134 .

In some embodiments, the RGB image signals are input to the first neural network 132 or the second neural network 134 frame by frame.

In this embodiment of the present disclosure, the detection apparatus is used to implement the detection method provided by any of the foregoing embodiments. For other related descriptions, reference may be made to the specific description of the detection method in the foregoing embodiment, which will not be repeated here.

In a third aspect, an embodiment of the present disclosure provides a fusion processing unit, which is applied to a target object detection device. Referring to FIG. 13 , the fusion processing unit includes: one or more processors 201 ; and a memory 202 that stores one or more processors thereon. a program, when one or more programs are executed by one or more processors, so that one or more processors implement the target object detection method described in the first aspect of the embodiment of the present disclosure; one or more I/O interfaces 203 , connected between the processor 201 and the memory 202 , and configured to implement information interaction between the processor 201 and the memory 202 .

The processor 201 is a device with data processing capability, including but not limited to a central processing unit (CPU), etc.; the memory 202 is a device with data storage capability, including but not limited to random access memory (RAM, more specifically Such as SDRAM, DDR, etc.), read only memory (ROM), electrified erasable programmable read only memory (EEPROM), flash memory (FLASH); I/O interface (read and write interface) 203 is connected between the processor 201 and the memory 202 , can realize the information interaction between the processor 201 and the memory 202, which includes but is not limited to a data bus (Bus) and the like.

In some embodiments, processor 201, memory 202, and I/O interface 203 are interconnected by bus 204, which in turn is connected to other components of the computing device.

Those of ordinary skill in the art can understand that all or some of the steps in the methods disclosed above, functional modules/units in the systems, and devices can be implemented as software, firmware, hardware, and appropriate combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical components Components execute cooperatively. Some or all physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit . Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As known to those of ordinary skill in the art, the term computer storage media includes both volatile and nonvolatile implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules or other data flexible, removable and non-removable media. Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices, or may Any other medium used to store desired information and which can be accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and can include any information delivery media, as is well known to those of ordinary skill in the art .

Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should only be construed in a general descriptive sense and not for purposes of limitation. In some instances, it will be apparent to those skilled in the art that features, characteristics and/or elements described in connection with a particular embodiment may be used alone or in combination with other embodiments, unless expressly stated otherwise. Features and/or elements are used in combination. Accordingly, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the scope of the present disclosure as set forth in the appended claims.

Claims

A method for detecting a target object, comprising:

acquiring event data, where the event data represents light intensity change information in the target plane, and the light intensity change information is used to determine at least one target object in the target plane;

acquiring radar detection data for the target object, where the radar detection data is information describing the motion state of the target object;

The event data and the radar detection data are fused to generate multi-dimensional motion state information of the target object.
The detection method according to claim 1, wherein the radar detection data includes first motion state component information of the target object in a first direction, and the first direction is perpendicular to the target plane;

The step of performing fusion processing on the event data and the radar detection data to generate the multi-dimensional motion state information of the target object includes:

Determine the second motion state component information of the target object on the target plane according to the event data;

Multi-dimensional motion state information of the target object is generated according to the second motion state component information and the first motion state component information.
The detection method according to claim 2, wherein the event data includes the coordinates of the pixel points in the target plane where the light intensity changes and light intensity change information; the second motion state component information includes the target object position coordinates in the target plane;

The step of determining the second motion state component information of the target object on the target plane according to the event data includes:

generating an event frame according to the coordinates and light intensity change information of each of the pixel points in the event data within the preset sampling period;

The position coordinates of the target object in the target plane are determined according to the event frame.
The detection method according to claim 3, wherein the second motion state component information further includes a second velocity component of the target object in a second direction and a third velocity component in a third direction, the second direction, the third direction is parallel to the target plane, and the second direction is perpendicular to the third direction;

The step of determining the second motion state component information of the target object on the target plane according to the event data further includes:

determining a second offset of the target object in the second direction and a third offset of the target object in the third direction according to the position coordinates of the target object in the target plane;

determining the second velocity component based on the second offset;

The third velocity component is determined based on the third offset.
The detection method according to claim 4, wherein the first motion state component information includes a first velocity component and a distance parameter of the target object in the first direction; the multi-dimensional motion state information includes a 3D velocity and 3D coordinates;

The step of generating multi-dimensional motion state information of the target object according to the second motion state component information and the first motion state component information includes:

Determine the position coordinates of the target object in the target plane corresponding to the same time point, the first velocity component, the second velocity component, the third velocity component and the target object in the first velocity component. distance parameter in one direction;

According to the position coordinates of the target object in the target plane corresponding to the same time point, the first velocity component, the second velocity component, the third velocity component and the target object in the first velocity A distance parameter in one direction determines the three-dimensional velocity and the three-dimensional coordinates.
The detection method according to any one of claims 2 to 5, wherein the step of acquiring radar detection data for the target object comprises:

A first velocity component and a distance parameter of the target object in the first direction are acquired as the first motion state component information.
The detection method according to claim 1, wherein the radar detection data is laser point cloud data;

The step of performing fusion processing on the event data and the radar detection data to generate the multi-dimensional motion state information of the target object includes:

generating a three-dimensional image according to the laser point cloud data;

generating an event frame according to the event data in the preset sampling period;

The three-dimensional image and the event frame are input into a neural network for processing to generate multi-dimensional motion state information of the target object.
The detection method according to claim 1, wherein the radar detection data is laser point cloud data;

The described event data is fused with the radar detection data, and the step of generating the multi-dimensional motion state information of the target object includes:

processing the laser point cloud data to generate a front view and a top view of the laser point cloud data;

generating an event frame according to the event data in the preset sampling period;

The front view, the top view, and the event frame are input into the neural network for processing to generate multi-dimensional motion state information of the target object.
The detection method according to claim 1, wherein the radar detection data is laser point cloud data;

The step of performing fusion processing on the event data and the radar detection data to generate the multi-dimensional motion state information of the target object includes:

Determine at least one target area according to the event data, obtain first coordinate information of the at least one target area, and each of the target areas corresponds to one of the target objects;

The second coordinate information of the target area in the laser point cloud data is determined according to the first coordinate information, and multi-dimensional motion state information of the target object is generated.
The detection method according to claim 1, wherein the radar detection data is laser point cloud data;

The detection method further includes: acquiring at least one RGB image signal;

The step of performing fusion processing on the event data and the radar detection data to generate the multi-dimensional motion state information of the target object includes:

At least one channel of the RGB image signal, the laser point cloud data, and the event data are fused to generate multi-dimensional motion state information of the target object.
The detection method according to any one of claims 1 to 10, wherein the step of acquiring event data comprises:

The event data is acquired in response to changes in the light intensity of the pixel points in the target plane, and the event data includes coordinates of the pixel points where the light intensity changes in the target plane, light intensity change information, and time information.
The detection method according to any one of claims 1 to 10, wherein the detection method further comprises:

The multi-dimensional motion state information is output.
The detection method according to any one of claims 1 to 10, wherein the step of acquiring event data comprises:

The event data is acquired by a dynamic vision sensor.
A device for detecting a target object, comprising:

a first sensor for detecting light intensity change information in the target plane to generate event data, the light intensity change information being used to determine at least one target object in the target plane;

a radar, for acquiring radar detection data for the target object, where the radar detection data is information describing the motion state of the target object;

The fusion processing unit is configured to perform fusion processing on the event data and the radar detection data to generate multi-dimensional motion state information of the target object.
The detection device according to claim 14, wherein the first sensor is a dynamic vision sensor;

The dynamic vision sensor is used to detect the change of the light intensity of each pixel in the target plane, and generate the event data; the event data includes the coordinates and light intensity of the pixel where the light intensity changes in the target plane change information.
The detection device according to claim 14, wherein the radar is a pulse Doppler radar, and the radar detection data includes first motion state component information of the target object in a first direction, and the first direction is vertical at the target plane;

The pulse Doppler radar is used for sending and receiving pulse signals to determine the first motion state component information of the target object in the first direction.
The detection device according to claim 16, wherein the fusion processing unit is configured to: determine the second motion state component information of the target object on the target plane according to the event data; The component information and the first motion state component information generate multi-dimensional motion state information of the target object.
The detection device according to claim 14, wherein the radar is a laser radar, and the radar detection data is laser point cloud data;

The lidar is used for emitting a laser beam to detect at least one target object, and generating the laser point cloud data;

The fusion processing unit includes a first image signal processor and a first neural network;

The first image signal processor is used for: generating a three-dimensional image according to the laser point cloud data; generating an event frame according to the event data in a preset sampling period;

The first neural network is used to process the three-dimensional image and the event frame to generate multi-dimensional motion state information of the target object.
The detection device according to claim 14, wherein the radar is a laser radar, and the radar detection data is laser point cloud data;

The lidar is used for emitting a laser beam to detect at least one target object, and generating the laser point cloud data;

The fusion processing unit includes a second image signal processor and a second neural network;

The second image signal processor is used for processing the laser point cloud data to generate a front view and a top view of the laser point cloud data; and generating an event frame according to the event data in a preset sampling period;

The second neural network is used for processing the front view, the top view and the event frame to generate multi-dimensional motion state information of the target object.
The target detection method according to claim 14, wherein the radar is a laser radar, and the radar detection data is laser point cloud data;

The lidar is used for emitting a laser beam to detect at least one target object, and generating the laser point cloud data;

The fusion processing unit is configured to: determine at least one target area according to the event data, obtain first coordinate information of the at least one target area, and each target area corresponds to one of the target objects; The coordinate information determines the second coordinate information of the target area in the laser point cloud data, and generates multi-dimensional motion state information of the target object.
The detection method according to claim 14, wherein the radar is a lidar, and the radar detection data is laser point cloud data; the lidar is used to emit a laser beam to detect at least one target object, and generate all the laser point cloud data;

The detection device further includes at least one second sensor;

The second sensor is used to acquire RGB images and generate RGB image signals;

The fusion processing unit is configured to perform fusion processing on at least one channel of the RGB image signal, the laser point cloud data, and the event data to generate multi-dimensional motion state information of the target object.
A fusion processing unit, applied to a detection device of a target object, the fusion processing unit comprising:

one or more processors;

A storage device having one or more programs stored thereon which, when executed by the one or more processors, cause the one or more processors to implement any one of claims 1 to 13 A method for detecting a target object.
A computer-readable medium on which a computer program is stored, and when the program is executed by a processor, implements the method for detecting a target object according to any one of claims 1 to 13.