CN112529011A

CN112529011A - Target detection method and related device

Info

Publication number: CN112529011A
Application number: CN202011455234.7A
Authority: CN
Inventors: 欧勇盛; 熊荣; 江国来; 王志扬; 瞿炀炀; 徐升; 赛高乐; 刘超; 吴新宇
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2020-12-10
Filing date: 2020-12-10
Publication date: 2021-03-19

Abstract

The application provides a target detection method and a related device. The target detection method comprises the following steps: acquiring image data and point cloud data; respectively extracting the features of the image data and the point cloud data to obtain first feature data corresponding to the image data and second feature data corresponding to the point cloud data; fusing the first characteristic data and the second characteristic data to obtain third characteristic data; and obtaining a target detection result according to the third characteristic data. The method can meet all-weather requirements and greatly reduce the cost.

Description

Target detection method and related device

Technical Field

The invention relates to the technical field of robots, in particular to a target detection method and a related device.

Background

In the running process of the train, the front obstacle becomes one of the main factors influencing the running safety of the train; therefore, how to detect the front obstacle is receiving a lot of attention.

At present, an image recognition sensing system or a radar sensing system carried by a train is generally used for sensing the surrounding environment so as to detect obstacles in front of the train; however, in the method for detecting the obstacle by using the vision-based image recognition sensing system, when the train is under the condition of weak illumination, the robustness is low, so that the system cannot meet the all-weather detection requirement; and the obstacle is detected based on the radar sensing system, if a high-beam laser radar is adopted, the cost is higher, and if a low-beam laser radar is adopted, the resolution and the refresh rate are lower, so that the requirements cannot be met.

Disclosure of Invention

The application provides a target detection method and a target detection device, and the target detection method can solve the problems that an image recognition based sensing system in the existing method cannot meet all-weather detection requirements, and a radar sensing system based sensing system is high in cost.

In order to solve the technical problem, the application adopts a technical scheme that: an object detection method is provided. The method comprises the following steps: acquiring image data and point cloud data; respectively extracting the features of the image data and the point cloud data to obtain first feature data corresponding to the image data and second feature data corresponding to the point cloud data; fusing the first characteristic data and the second characteristic data to obtain third characteristic data; and obtaining a target detection result according to the third characteristic data.

Wherein, carry out feature extraction to image data and some cloud data respectively, obtain the first feature data of corresponding image data, and the second feature data of corresponding point cloud data, include: inputting the image data and the point cloud data into a trained network model, extracting the features of the image data by using a first feature extraction module in the network model to obtain first feature data corresponding to the image data, and extracting the features of the point cloud data by using a second feature extraction module in the network model to obtain second feature data corresponding to the point cloud data; fusing the first characteristic data and the second characteristic data to obtain third characteristic data, wherein the third characteristic data comprises the following steps: and fusing the first characteristic data and the second characteristic data by using a characteristic fusion module in the network model to obtain third characteristic data.

And the first feature extraction module and/or the second feature extraction module are/is of a ResNet structure.

The first feature extraction module and/or the second feature extraction module comprise a plurality of convolution layers, and each convolution layer comprises a BN layer connected with the convolution layer.

The method includes the following steps that a feature fusion module in a network model is used for fusing first feature data and second feature data to obtain third feature data, and the method further includes the following steps: creating a fusion loss function; fitting the third characteristic data by using a fusion loss function to obtain a loss value; and constraining the loss value to correct the third characteristic data.

Wherein, the method further comprises: constructing a network model; and inputting the preset data set into the network model, and training the network model for multiple times so as to correct the parameters of the network model.

Wherein, after obtaining image data and point cloud data, still include: and performing coordinate transformation on the point cloud data to enable the coordinate systems of the point cloud data and the image data to be consistent.

Wherein, carrying out coordinate transformation on the point cloud data comprises the following steps: converting a world coordinate system of the point cloud data into a camera coordinate system; converting the camera coordinate system into an image coordinate system; and translating the image coordinate system to make the coordinate system of the point cloud data consistent with the coordinate system of the image data.

After the coordinate transformation is performed on the point cloud data, the method further comprises the following steps: and filtering data out of the common visual field of the point cloud data and the image data.

Wherein, after obtaining image data and point cloud data, still include: and filtering ground point clouds in the point cloud data, wherein the ground point clouds are a set of points with a distance to the ground smaller than a preset distance.

Wherein, ground point cloud in the filtration point cloud data includes: obtaining a ground model; determining a set of points in the point cloud data, the distance between which and the ground model is less than a preset distance, as a ground point cloud; and filtering the ground point cloud in the point cloud data.

Wherein, obtaining the ground model comprises: randomly selecting 3 points from the point cloud data in each iteration to establish a plane model; determining an inner point and an outer point according to the distance between each point in the point cloud data and the plane model; and when the number of the interior points meets the preset requirement, establishing a ground model by using the interior points meeting the preset requirement.

In order to solve the above technical problem, another technical solution adopted by the present application is: an object detecting device is provided. The target detection device comprises a memory and a processor which are connected with each other; wherein, the memory is used for storing program instructions for realizing the target detection method; the processor is operable to execute program instructions stored by the memory.

In order to solve the above technical problem, the present application adopts another technical solution: a computer-readable storage medium is provided. The computer readable storage medium stores a program file that is executable by a processor to implement the object detection method as referred to above.

According to the target detection method and the related device, image data and point cloud data are obtained, and then feature extraction is respectively carried out on the image data and the point cloud data to obtain first feature data corresponding to the image data and second feature data corresponding to the point cloud data; then, fusing the first characteristic data and the second characteristic data to obtain third characteristic data; finally, according to the third characteristic data, a target detection result is obtained; the method detects the target based on the image data and the point cloud data together, so that the target can be detected based on the point cloud data acquired by the high-beam laser radar under the condition of weak illumination conditions, the method can meet all-weather requirements, and the target is detected based on the image data and the point cloud data acquired by the low-beam laser radar under the condition of good illumination, so that the cost is greatly reduced compared with a scheme of detecting the target by using the high-beam laser radar all weather.

Drawings

In order to more clearly illustrate the technical solutions in the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a flowchart of a target detection method according to an embodiment of the present application;

FIG. 2 is a flowchart of a method for filtering ground point clouds in point cloud data according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a network architecture according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an object detection apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first", "second" and "third" in this application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any indication of the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise. All directional indications (such as up, down, left, right, front, and rear … …) in the embodiments of the present application are only used to explain the relative positional relationship between the components, the movement, and the like in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indication is changed accordingly. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The present application will be described in detail with reference to the accompanying drawings and examples.

Referring to fig. 1, fig. 1 is a flowchart of a target detection method according to an embodiment of the present application; in the embodiment, a target detection method is provided, which is particularly applicable to detecting an obstacle in front of a vehicle in driving so as to reduce the occurrence of traffic accidents caused by the obstacle in front.

Specifically, the method comprises the following steps:

step S11: image data and point cloud data are acquired.

Specifically, image data of an environment to be measured can be acquired through a visual sensor, and point cloud data of the environment to be measured can be acquired through a laser radar; the environment to be measured can be the front and/or surrounding environment in the running process of the vehicle.

Each point data of the laser point cloud data is composed of three-dimensional coordinates (XYZ) of a target point, and the points are gathered together to describe the surrounding environment of the laser radar and the outline of the target under a three-dimensional coordinate system; while the image data acquired by a vision sensor, such as a color camera, lies in a two-dimensional plane, where each point includes R, G, B three components to provide color information of the object in the field of view of the camera. In order to make the acquired data information dimensions consistent, transforming the coordinates of the point cloud data after step S11 to make the point cloud data consistent with the coordinate system of the image data; specifically, the laser radar and the vision sensor may be calibrated to obtain the internal reference and the external reference of the laser radar, and then coordinates of the point cloud data are transformed based on the obtained internal reference and the obtained external reference. The projection of the point cloud data from the world coordinate system to the pixel coordinate system is subjected to three transformations, namely rigid transformation, perspective projection and translation transformation, and specifically, the world coordinate system of the point cloud data is converted into a camera coordinate system, then the camera coordinate system is converted into an image coordinate system, and then the image coordinate system is translated so that the coordinate system of the point cloud data is consistent with the coordinate system of the image data.

In order to make the point cloud data and the image data have the same visual eye, data outside the common visual field of the point cloud data and the image data can be further filtered after the coordinates are converted, and a projection image of the point cloud on the RGB image is obtained.

Specifically, in an embodiment, because the number of point cloud data acquired by each frame of the laser radar is large, the process of directly processing the original data is complicated, and in order to simplify the processing process of the point cloud data, after the point cloud data is acquired, ground point cloud in the point cloud data can be further filtered, so that an intelligent vehicle or a robot and the like can only identify targets near the driving path range in the driving process, the scanning range of the laser radar is reduced, the number of the point cloud to be processed is reduced, and meanwhile, the influence of the ground point cloud is avoided; then, converting the coordinates of point cloud data; the ground point cloud is a set of points with a distance to the ground smaller than a preset distance; the preset distance may be greater than 0.2 m, for example, 0.3 m.

Referring to fig. 2, fig. 2 is a flowchart illustrating a method for filtering ground point clouds in point cloud data according to an embodiment of the present disclosure; specifically, the method for filtering the ground point cloud in the point cloud data can comprise the following steps:

step S21: and obtaining the ground model.

Specifically, a three-point RANSAC algorithm can be adopted to obtain a ground model; the RANSAC algorithm comprises the following specific steps: 1) randomly assuming a small group of local interior points as initial values, then fitting a model by using the local interior points, wherein the model is suitable for the assumed local interior points, and all unknown parameters can be calculated from the assumed local interior points; 2) testing all other data based on the model, if a certain point is suitable for the fitted model, considering the point to be a local point, and expanding the local point; 3) if enough points are classified as the assumed intra-office points, the estimated model is reasonable enough; 4) re-estimating the model by using all assumed local interior points so as to update the model; 5) and finally, evaluating the model by estimating the error rate of the local interior point and the model.

In a specific embodiment, step S21 may specifically include randomly selecting 3 points from the point cloud data to establish a plane model in each iteration, and then determining an inner point and an outer point according to a distance between each point in the point cloud data and the plane model; specifically, the plane parameters may be calculated by using the equation ax + by + cz + d as 0 as the plane model; assuming that the plane is perpendicular to the Z axis, the iteration number n is set, and the fitting precision is a, namely, points which are at a distance of less than or equal to a from the obtained plane are all regarded as inner points, and points which are greater than a are all regarded as outer points. And when the number of the interior points meets the preset requirement, establishing a ground model by using the interior points meeting the preset requirement.

Step S22: and determining a set of points in the point cloud data, the distance between which and the ground model is less than a preset distance, as the ground point cloud.

Step S23: and filtering the ground point cloud in the point cloud data.

Specifically, the specific implementation process of step S23 may refer to a specific implementation process of filtering the ground point cloud in the point cloud data in the prior art, and may achieve the same or similar technical effects, and refer to the prior art specifically, and no further description is given here.

Step S12: and respectively carrying out feature extraction on the image data and the point cloud data to obtain first feature data corresponding to the image data and second feature data corresponding to the point cloud data.

Specifically, the method further includes pre-training the network model before executing step S12; wherein, the learning rate and the Batch Size are extremely important parameters in the network model training process; in the specific implementation process, the step of pre-training the network model specifically comprises the steps of constructing the network model, inputting a preset data set into the network model, and training the network model for multiple times so as to modify parameters of the network model, thereby reasonably selecting and optimizing the parameters of the network model; the preset data set can be a plurality of pieces of data which are collected in advance, and each piece of data comprises image data and point cloud data; the parameters of the network model include at least a learning rate and a Batch Size. Wherein, the learning rate directly influences the convergence state of the network model; if the learning rate is too large, the network model is difficult to converge, so that the result of global optimization is difficult to obtain, and if the learning rate is too small, the network model converges slowly, so that the training cost is greatly increased; the Batch Size affects the generalization performance of the network model, which is the amount of data required for one training in a training iteration, and the Size of the Batch Size is limited by the memory of the training platform and related to the Size of the network input window; therefore, the network model has strong robustness and high accuracy by optimizing the parameters of the network model; specifically, the value of Batch Size is generally set to be an integral multiple of 4, and the larger the value of Batch Size is, the initial learning rate is correspondingly increased, so as to ensure the model effect.

Specifically, referring to fig. 3, fig. 3 is a schematic structural diagram of a network architecture according to an embodiment of the present application; in a specific embodiment, after the processing of step S11, the image data and the point cloud data are input into a trained network model, and are preprocessed so that the formats of the image data and the point cloud data are the same; specifically, the format of the image data and the dot cloud data after the preprocessing may be 160 × 640 × 3 format.

Then, feature extraction is performed on the image data by using a first feature extraction module 31 in the network model to obtain first feature data corresponding to the image data, and feature extraction is performed on the point cloud data by using a second feature extraction module 32 in the network model to obtain second feature data corresponding to the point cloud data. The first feature extraction module 31 and the second feature extraction module 32 are adopted to respectively extract features of the image data and the point cloud data, so that the two feature extraction processes are not interfered with each other, and the effectiveness of extracting features from different data is effectively guaranteed.

In an embodiment, in order to avoid a phenomenon of gradient disappearance or gradient explosion during training due to an excessive number of convolution layers, each convolution layer may include a bn (batch normalization) layer connected to the convolution layer, so as to ensure stable training.

Specifically, in view of the limited computing performance of the autopilot platform, the first feature extraction module 31 and the second feature extraction module 32 may adopt a lightweight convolutional network as a feature extraction network; specifically, the first feature extraction module 31 and/or the second feature extraction module 32 may be a ResNet structure, which may effectively increase the depth of the network and enhance the feature extraction capability of the network, so as to better extract more implicit features.

Step S13: and fusing the first characteristic data and the second characteristic data to obtain third characteristic data.

Specifically, the first feature data and the second feature data are fused by using a feature fusion module 33 in the network model to obtain third feature data.

Specifically, in an embodiment, in order to obtain a better feature data fusion effect, after step S13, a fusion loss function is created, the third feature data is then fitted by using the fusion loss function to obtain a loss value, and then the loss value is constrained to correct the third feature data. Wherein, the fusion loss function can be specifically an MSE loss function, namely a mean square loss function; by means of the constraint of loss, the two characteristic data can be fused together better.

It will be appreciated that the network model has two inputs and one output; wherein, the input is RGB image data obtained by a camera and XYZ point cloud data coded by imaging, the output is third characteristic data,

step S14: and obtaining a target detection result according to the third characteristic data.

Specifically, the third characteristic data is processed to obtain a target such as a person or a vehicle in the environment to be detected, and then a target detection result is obtained.

It should be noted that the training and testing data used in the present application is based on the laser point cloud and the left color camera data in the KITTI target detection dataset.

Specifically, a network model for performing feature level fusion on laser point cloud data and visual image data is designed based on the deep convolution neural network, so that the trained network model has high robustness and high accuracy; therefore, under the condition of good light, the type and the distance of the barrier can be respectively judged through color image data and a depth image acquired by the RGB camera, and under the special conditions of low illumination light, rain and fog and the like, the laser radar can be adopted to carry out real-time barrier avoidance; therefore, the method can detect the targets such as people and vehicles on the road surface under the condition of extreme weather or large illumination change, and effectively avoids accidents such as traffic accidents.

In the target detection method provided by this embodiment, image data and point cloud data are obtained, and then feature extraction is performed on the image data and the point cloud data respectively to obtain first feature data corresponding to the image data and second feature data corresponding to the point cloud data; then, fusing the first characteristic data and the second characteristic data to obtain third characteristic data; finally, according to the third characteristic data, a target detection result is obtained; the method detects the target based on the image data and the point cloud data together, so that the target can be detected based on the point cloud data acquired by the high-beam laser radar under the condition of weak illumination conditions, the method can meet all-weather requirements, and the target is detected based on the image data and the point cloud data acquired by the low-beam laser radar under the condition of good illumination, so that the cost is greatly reduced compared with a scheme of detecting the target by using the high-beam laser radar all weather.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a target detection device according to an embodiment of the present application; in the present embodiment, an object detection apparatus 500 is provided, the object detection apparatus 500 includes a memory 501 and a processor 502 connected to each other; wherein, the memory 501 is used for storing program instructions for implementing the object detection method according to the above embodiment; the processor 502 is operable to execute program instructions stored by the memory 501.

The processor 502 may also be referred to as a Central Processing Unit (CPU). The processor 502 may be an integrated circuit chip having signal processing capabilities. The processor 502 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 501 may be a memory bank, a TF card, etc., and may store all information in the object detection apparatus 500, including input raw data, computer programs, intermediate operation results, and final operation results, all stored in the storage 501. It stores and retrieves information based on the location specified by the controller. With the memory 501, the object detection apparatus 500 has a memory function to ensure normal operation. The memory 501 in the object detection device 500 may be classified into a main memory (internal memory) and an auxiliary memory (external memory) according to the use of the memory, and may be classified into an external memory and an internal memory. The external memory is usually a magnetic medium, an optical disk, or the like, and can store information for a long period of time. The memory refers to a storage component on the main board, which is used for storing data and programs currently being executed, but is only used for temporarily storing the programs and the data, and the data is lost when the power is turned off or the power is cut off.

The object detecting device 500 further includes other components, which are the same as other components and functions of the object detecting device in the prior art, and are not described herein again.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application. In the present embodiment, a computer-readable storage medium is provided, which stores a program file 600, and the program file 600 can be executed by a processor to implement the object detection method according to the above-described embodiments.

The program file 600 may be stored in the computer-readable storage medium in the form of a software product, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods according to the embodiments of the present application. The aforementioned storage device includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

The above embodiments are merely examples and are not intended to limit the scope of the present disclosure, and all modifications, equivalents, and flow charts using the contents of the specification and drawings of the present disclosure or those directly or indirectly applied to other related technical fields are intended to be included in the scope of the present disclosure.

Claims

1. A method of object detection, the method comprising:

acquiring image data and point cloud data;

respectively extracting the features of the image data and the point cloud data to obtain first feature data corresponding to the image data and second feature data corresponding to the point cloud data;

fusing the first characteristic data and the second characteristic data to obtain third characteristic data;

and obtaining a target detection result according to the third characteristic data.

2. The method of claim 1,

the method for respectively extracting the features of the image data and the point cloud data to obtain first feature data corresponding to the image data and second feature data corresponding to the point cloud data comprises the following steps:

inputting the image data and the point cloud data into a trained network model, so as to perform feature extraction on the image data by using a first feature extraction module in the network model to obtain first feature data corresponding to the image data, and perform feature extraction on the point cloud data by using a second feature extraction module in the network model to obtain second feature data corresponding to the point cloud data;

the fusing the first characteristic data and the second characteristic data to obtain third characteristic data includes:

and fusing the first characteristic data and the second characteristic data by utilizing a characteristic fusion module in the network model to obtain third characteristic data.

3. The method of claim 2,

the first feature extraction module and/or the second feature extraction module are/is of a ResNet structure.

4. The method of claim 2,

5. The method of claim 2,

after the feature fusion module in the network model is used to fuse the first feature data and the second feature data to obtain third feature data, the method further includes:

creating a fusion loss function;

fitting the third characteristic data by using the fusion loss function to obtain a loss value;

and constraining the loss value to correct the third characteristic data.

6. The method of claim 2,

the method further comprises the following steps:

constructing a network model;

and inputting a preset data set into the network model, and training the network model for multiple times so as to correct the parameters of the network model.

7. The method of claim 1,

after the acquiring the image data and the point cloud data, the method further comprises:

and performing coordinate transformation on the point cloud data to enable the coordinate systems of the point cloud data and the image data to be consistent.

8. The method of claim 7,

the coordinate transformation of the point cloud data comprises:

converting a world coordinate system of the point cloud data into a camera coordinate system;

converting the camera coordinate system into an image coordinate system;

and translating the image coordinate system to enable the coordinate system of the point cloud data to be consistent with the coordinate system of the image data.

9. The method of claim 7,

after the coordinate transformation is performed on the point cloud data, the method further comprises the following steps:

and filtering out data outside the common visual field of the point cloud data and the image data.

10. The method of claim 1,

and filtering ground point clouds in the point cloud data, wherein the ground point clouds are a set of points with a distance to the ground smaller than a preset distance.

11. The method of claim 10,

the filtering of ground point clouds in the point cloud data comprises:

obtaining a ground model;

determining a set of points in the point cloud data, the distance between which and the ground model is less than the preset distance, as a ground point cloud;

filtering the ground point cloud in the point cloud data.

12. The method of claim 11,

the obtaining of the ground model comprises:

randomly selecting 3 points from the point cloud data in each iteration to establish a plane model;

determining an inner point and an outer point according to the distance between each point in the point cloud data and the plane model;

and when the number of the inner points meets the preset requirement, establishing a ground model by using the inner points meeting the preset requirement.

13. An object detection device, characterized in that the object detection device comprises a memory and a processor connected with each other; wherein the memory is for storing program instructions for implementing the object detection method of any one of claims 1-12; the processor is configured to execute the program instructions stored by the memory.

14. A computer-readable storage medium, characterized in that a program file is stored, which is executable by a processor to implement the object detection method according to any one of claims 1-12.