CN113128348A - Laser radar target detection method and system fusing semantic information - Google Patents

Laser radar target detection method and system fusing semantic information Download PDF

Info

Publication number
CN113128348A
CN113128348A CN202110317542.1A CN202110317542A CN113128348A CN 113128348 A CN113128348 A CN 113128348A CN 202110317542 A CN202110317542 A CN 202110317542A CN 113128348 A CN113128348 A CN 113128348A
Authority
CN
China
Prior art keywords
point cloud
image
cloud data
frame
semantic segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110317542.1A
Other languages
Chinese (zh)
Other versions
CN113128348B (en
Inventor
李燕
陈超
齐飞
王晓甜
石光明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202110317542.1A priority Critical patent/CN113128348B/en
Publication of CN113128348A publication Critical patent/CN113128348A/en
Application granted granted Critical
Publication of CN113128348B publication Critical patent/CN113128348B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a laser radar target detection method and a system fusing semantic information, wherein the method comprises the following steps: performing semantic segmentation processing on the image frame under each timestamp to obtain a semantic segmentation score; adding image RGB characteristics under the corresponding camera coordinates in each frame of point cloud data; projecting the point cloud data with the image RGB features added to the output of a segmentation network and appending the semantic segmentation score to the point cloud data; and carrying out target classification and 3D frame regression based on graph convolution on the point cloud data added with the semantic segmentation scores and the image RGB features to obtain a target position frame and a target category. The technical problems that in the prior art, the detection effect of the characteristics of the target is not accurate enough, and then the target detection of vehicles and pedestrians is not accurate enough and efficient are solved.

Description

Laser radar target detection method and system fusing semantic information
Technical Field
The invention relates to the related field of computer vision field, in particular to a laser radar target detection method and system fusing semantic information.
Background
The environment perception technology has important significance in the fields of intelligent transportation, intelligent wearing equipment, smart cities and the like. The sensor acquires and processes information, which is the basic and technical premise for realizing environment perception, and the image data acquired by the camera has inherent depth ambiguity, is greatly influenced by light and weather, but can provide fine-grained texture and color information; on the other hand, the point cloud data acquired by the laser radar provides very accurate spatial position information of the target, but the resolution and texture information are weak. In order to improve poor detection effect caused by a single sensor, a research method of multi-sensor fusion is adopted at present, so that abundant and accurate environmental information can be provided.
The existing multi-sensor fusion method is mainly divided into three types: feature level fusion, decision level fusion and two-stage fusion of 2D target frame projection point cloud. Feature level fusion, such as MV3D proposed by Xiaozhi Chen et al, and AVOD and other network structures proposed by JasonKu et al, mainly extracts image features and point cloud features in a shunting manner, and then directly cascades the image features and point cloud features or performs multi-scale fusion of the features on a feature level. However, the biggest disadvantage of this fusion method is "feature blurring", in which, on one hand, one point of the point cloud corresponds to a plurality of pixel points on the image view, and on the other hand, the order of magnitude of the features in the extracted image feature map and the point cloud feature map differ greatly, which causes that the information of small order of magnitude is not well utilized in the feature map which actually functions; decision level fusion is a relatively simple fusion mode, such as a CLOCS network proposed by SuPang et al, that is, features of two modes are not fused in a feature layer or at the beginning, but training reasoning of respective networks is performed respectively to obtain propulses under 2D and 3D detectors respectively, then the propulses of the two modes are encoded into sparse tensors, and corresponding feature fusion is performed on non-empty elements by adopting two-dimensional convolution. The decision layer fusion has the advantages that network structures of two modes are not interfered with each other and can be trained and combined independently, but has certain defects that the fusion in the decision layer is actually the least utilization of original sensor data information and the complementary characteristic among multi-sensor data cannot be well utilized; the two-stage method, represented by the F-pointet structure proposed by Charles r.qi et al, first obtains the image target detection result according to the 2D detector, and then projects it onto the 3D lidar data. However, the fusion mode excessively depends on the performance of the 2D detector, and after the two-dimensional frame is projected to the point cloud data, there is a problem that the feature extraction and identification of the point set cannot be performed in the projected view cone frame due to the sparsity of the point cloud.
However, in the process of implementing the technical solution of the invention in the embodiments of the present application, the inventors of the present application find that the above-mentioned technology has at least the following technical problems:
the technical problem that the detection effect of the characteristics of the target is not accurate enough, and then the target detection of vehicles and pedestrians is not accurate enough and efficient exists in the prior art.
Disclosure of Invention
The embodiment of the application provides the laser radar target detection method and system fusing the semantic information, solves the technical problems that in the prior art, the detection effect of the target characteristics is not accurate enough, and further the target detection of vehicles and pedestrians is not accurate enough and efficient, further the visual laser fusion target detection method based on image semantic segmentation and image convolution features is extracted, and the technical effects of accuracy and high efficiency of target detection of the vehicles and the pedestrians on the road are obviously improved.
In view of the above problems, the present application provides a laser radar target detection method and system fusing semantic information.
In a first aspect, the present application further provides a laser radar target detection method fusing semantic information, where the method includes: performing semantic segmentation processing on the image frame under each timestamp to obtain a semantic segmentation score; adding image RGB characteristics under the corresponding camera coordinates in each frame of point cloud data; projecting the point cloud data with the image RGB features added to the output of a segmentation network and appending the semantic segmentation score to the point cloud data; and carrying out target classification and 3D frame regression based on graph convolution on the point cloud data added with the semantic segmentation scores and the image RGB features to obtain a target position frame and a target category.
On the other hand, this application still provides a laser radar target detection system who fuses semantic information, the system includes: the first obtaining unit is used for performing semantic segmentation processing on the image frame under each timestamp to obtain a semantic segmentation score; the first adding unit is used for adding image RGB characteristics under corresponding camera coordinates in each frame of point cloud data; a first projection unit for projecting the point cloud data to which the image RGB features are added into an output of a segmentation network and attaching the semantic segmentation score to the point cloud data; and the second obtaining unit is used for carrying out target classification and 3D frame regression based on graph convolution on the point cloud data added with the semantic segmentation scores and the image RGB features to obtain a target position frame and a target category.
In a third aspect, the present invention provides a laser radar target detection system fusing semantic information, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of the first aspect when executing the program.
One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:
the method has the advantages that the point cloud data are processed by adopting semantic segmentation and image convolution, the semantic segmentation adopts a coder-decoder structure, the high-level semantic information is obtained while the contour information is kept, the point cloud data characteristics are extracted through the image convolution structure, the state of the point is updated according to the relative coordinate coding of the adjacent points and the central point characteristics, the structural characteristics of the space point are well represented, the detection accuracy is improved, the method for extracting the visual laser fusion target detection based on the image semantic segmentation and the image convolution characteristics is further achieved, and the technical effects of accuracy and high efficiency of the road vehicle and pedestrian target detection are remarkably improved.
The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.
Drawings
Fig. 1 is a schematic flow chart of a laser radar target detection method fusing semantic information according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a laser radar target detection method fusing semantic information according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an exemplary electronic device according to an embodiment of the present application.
Description of reference numerals: a first obtaining unit 11, a first adding unit 12, a first projecting unit 13, a second obtaining unit 14, a bus 300, a receiver 301, a processor 302, a transmitter 303, a memory 304, and a bus interface 305.
Detailed Description
The embodiment of the application provides the laser radar target detection method and system fusing the semantic information, solves the technical problems that in the prior art, the detection effect of the target characteristics is not accurate enough, and further the target detection of vehicles and pedestrians is not accurate enough and efficient, further the visual laser fusion target detection method based on image semantic segmentation and image convolution features is extracted, and the technical effects of accuracy and high efficiency of target detection of the vehicles and the pedestrians on the road are obviously improved. Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are merely some embodiments of the present application and not all embodiments of the present application, and it should be understood that the present application is not limited to the example embodiments described herein.
Summary of the application
The environment perception technology has important significance in the fields of intelligent transportation, intelligent wearing equipment, smart cities and the like. The sensor acquires and processes information, which is the basic and technical premise for realizing environment perception, and the image data acquired by the camera has inherent depth ambiguity, is greatly influenced by light and weather, but can provide fine-grained texture and color information; on the other hand, the point cloud data acquired by the laser radar provides very accurate spatial position information of the target, but the resolution and texture information are weak. In order to improve poor detection effect caused by a single sensor, a research method of multi-sensor fusion is adopted at present, so that abundant and accurate environmental information can be provided. However, the prior art has the technical problem that the detection effect of the characteristics of the target is not accurate enough, so that the target detection of vehicles and pedestrians is not accurate and efficient enough.
In view of the above technical problems, the technical solution provided by the present application has the following general idea:
the embodiment of the application provides a laser radar target detection method fusing semantic information, wherein the method comprises the following steps: performing semantic segmentation processing on the image frame under each timestamp to obtain a semantic segmentation score; adding image RGB characteristics under the corresponding camera coordinates in each frame of point cloud data; projecting the point cloud data with the image RGB features added to the output of a segmentation network and appending the semantic segmentation score to the point cloud data; and carrying out target classification and 3D frame regression based on graph convolution on the point cloud data added with the semantic segmentation scores and the image RGB features to obtain a target position frame and a target category.
Having thus described the general principles of the present application, various non-limiting embodiments thereof will now be described in detail with reference to the accompanying drawings.
Example one
As shown in fig. 1, an embodiment of the present application provides a laser radar target detection method fusing semantic information, where the method includes:
step S100: performing semantic segmentation processing on the image frame under each timestamp to obtain a semantic segmentation score;
specifically, the semantic segmentation refers to a process of grouping/segmenting pixels of an image according to different semantic meanings expressed in the image, and is an image obtained by performing semantic segmentation algorithm processing on an actually captured image, and performing semantic segmentation processing based on a codec for each frame of image. Further, the image processing is performed for each captured frame. Firstly, an encoder is used for extracting sampling features of an image, then a decoder is used for carrying out upsampling resolution recovery processing on a feature map to obtain a final prediction feature map, and based on the segmentation prediction map, the class scores representing different classes of images in the prediction map, namely the semantic segmentation scores, are obtained.
Step S200: adding image RGB characteristics under the corresponding camera coordinates in each frame of point cloud data;
specifically, the point cloud data is a set of vectors in a three-dimensional coordinate system, the scanning data is recorded in the form of points, each point includes three-dimensional coordinates, some may include color information (RGB) or reflection Intensity information (Intensity), the semantic segmentation score after segmentation is added to the point cloud points, the method comprises the steps of obtaining point cloud data of each frame, further, adding RGB (red, green and blue) features of corresponding images in the point cloud data of each frame, projecting the point cloud data with the RGB attached to a semantic segmentation network for output, attaching semantic segmentation scores to each point, converting the position of the space point cloud to the position of a coordinate point of a camera coordinate according to a conversion matrix of a point cloud coordinate system and a camera coordinate system, loading the images of the frames corresponding to the point cloud, obtaining RGB channel data under each coordinate value, and then cascading the RGB data to point cloud feature dimensions.
Step S300: projecting the point cloud data with the image RGB features added to the output of a segmentation network and appending the semantic segmentation score to the point cloud data;
specifically, for each frame of image, after indexing each point image coordinate with category score output by the semantic segmentation network, the corresponding category is superimposed to each point of the point cloud which has been projected to the image in the corresponding frame.
Step S400: and carrying out target classification and 3D frame regression based on graph convolution on the point cloud data added with the semantic segmentation scores and the image RGB features to obtain a target position frame and a target category.
Specifically, the point cloud data to which the semantic segmentation score and the image RGB feature are added is processed, and target classification and 3D regression processing based on a graph convolution network, that is, point state update is performed based on the graph convolution network to obtain a target position frame and target category information. The method comprises the steps of processing point cloud data by adopting semantic segmentation, processing image and graph convolution, acquiring high-level semantic information while retaining contour information by adopting a codec structure for semantic segmentation, extracting point cloud data characteristics by adopting a graph convolution structure, updating the state of a point according to relative coordinate codes of adjacent points and central point characteristics, well representing the structural characteristics of a space point, improving detection accuracy, further achieving a visual laser fusion target detection method based on image semantic segmentation and graph convolution characteristics, and remarkably improving the technical effects of accuracy and high efficiency of road vehicle and pedestrian target detection.
Further, the semantic segmentation processing is performed on the image frame under each timestamp to obtain a semantic segmentation score, in step S100 in this embodiment of the present application, the method further includes:
step S110: taking ResNet101 as a main network, and performing downsampling feature extraction on the image frames under each timestamp through an encoder;
step S120: performing resolution recovery processing of up-sampling on the image frame under each timestamp through a decoder to obtain a prediction characteristic map;
step S130: and obtaining the semantic segmentation score according to the prediction feature map.
Specifically, firstly, an encoder is used to extract sampling features of an image, and ResNet101 is used as a main network to extract the images, and the steps are as follows:
1) performing downsampling on the image for 4 times to extract features, setting the step size stride to be 2 by adopting a convolution kernel with the size of 3 x 3, and obtaining a feature map with the size of the original image 1/16;
2) respectively adopting 1 × 1 convolution layers and three 3 × 3 hole convolutions to the feature map, wherein the rates of the hole convolutions are (6,12 and 18), the output channels are all 256, and a BN layer is added;
3) carrying out global average pooling to obtain image-level characteristics;
4) inputting the data into a 1 x 1 convolution layer, setting an output channel to be 256, and performing bilinear interpolation to the original size;
5) combining the obtained 4 features with different scales together in a channel dimension concat, and fusing by adopting a 1 × 1 convolution kernel classification layer to obtain a new 256-channel feature;
obtaining a corresponding feature map based on the new features, and then performing resolution recovery processing on the feature map by adopting a decoder to obtain a final prediction feature map, wherein the steps are as follows:
1) carrying out bilinear interpolation on the feature map obtained by the encoder to obtain a 4-x feature map;
2) carrying out channel number dimension reduction on low-level features with corresponding sizes in the encoder by adopting a 1 × 1 convolution layer;
3) cascading the feature graphs with the same resolution obtained in the first two steps, and further fusing features by adopting a 3-by-3 convolution layer;
4) carrying out bilinear interpolation to obtain a segmentation prediction image with the same size as the original image; and obtaining the class scores of the images representing different classes in the segmentation prediction map, namely the semantic segmentation scores based on the segmentation prediction map.
Further, the step S200 of adding the RGB features of the image in the corresponding camera coordinates to each frame of point cloud data in the embodiment of the present application further includes:
step S210: for each frame of point cloud data, converting the spatial point cloud position to a coordinate point position under a camera coordinate system according to a conversion matrix from the point cloud coordinate system to the camera coordinate system;
step S220: screening points of which the Z-axis coordinate value is greater than 0.1 in all camera coordinate points to obtain a first index position set;
step S230: obtaining coordinate values under the image coordinate system through a conversion matrix from the camera coordinate system to the image coordinate system according to the first index position set;
step S240: loading an image frame corresponding to the point cloud data to obtain RGB channel data under each coordinate value;
step S250: cascading the RGB channel data to a point cloud feature dimension.
Further, the step S300 of projecting the point cloud data added with the RGB features of the image to an output of a segmentation network, and adding the semantic segmentation score to the point cloud data in this embodiment of the present application further includes:
step S310: for the image frame under each timestamp, indexing the image coordinates of each point with the semantic segmentation score output by the segmentation network;
step S320: superimposing the corresponding category into the point cloud data in the respective frame that has been projected into the image coordinate system.
Specifically, for each frame image, indexing each point image coordinate with category score output by the semantic segmentation network, namely processing each image frame under each time stamp through the semantic segmentation network to obtain an index coordinate capable of quickly retrieving each point image; converting the coordinates of the point cloud data of each frame, namely obtaining a corresponding coordinate conversion matrix through the point cloud coordinate system and the camera coordinate system, converting the spatial point cloud position into the coordinate point position under the camera coordinate system based on the coordinate conversion matrix, screening points with Z-axis coordinate values larger than 0.1 in each camera coordinate point to obtain a first index position set, obtaining the coordinate values under the image coordinate system through the conversion matrix from the camera coordinate system to the image coordinate system according to the first index position set, loading the image of the corresponding frame of the point cloud, and obtaining RGB channel data under each coordinate value; and cascading the RGB data to a point cloud characteristic dimension, namely, superimposing corresponding categories to each point of the point cloud which is projected to the image in a corresponding frame in the intensity dimension concat three rows of color information.
Further, the step S400 of the embodiment of the present invention further includes performing graph convolution-based target classification and 3D frame regression on the point cloud data to which the semantic segmentation score and the image RGB features are added, to obtain a target position frame and a target category:
step S410: performing downsampling-based graph construction on the point cloud data;
step S420: constructing a graph neural network to update and iterate the characteristics of each central point, and improving the state of the central point through the states of adjacent points;
step S430: positioning the boundary box of each category of branch prediction, and if one vertex is in one boundary box, calculating a predicted value and the Huber loss of the group route; if a vertex is not in the bounding box or is a non-interesting class, its position penalty is set to 0.
Specifically, the process of graph construction mainly includes:
1) using down-sampling to reduce the density of the point cloud, and selecting a central point by adopting a farthest distance method;
2) for each center point, finding a pair of points within a given cutoff distance by using the list of units;
3) and extracting characteristics of points and point-to-edge in each image by adopting a multilayer perceptron, and aggregating the characteristics through a Max function to serve as an initial state value of a central point.
Updating the characteristics of each central point by constructing a graph neural network, and improving the state of the central point by using the states of the neighbor points, wherein the improvement formula is as follows:
Figure BDA0002991784150000101
Figure BDA0002991784150000102
defining a point cloud containing N points as P ═ P1,...,pNIn which p isi=(xi,si)。xi∈R3Representing an original point cloudSpace coordinates (X, Y, Z), si∈RkA k-dimensional vector representing an attribute state of an original point; f. oft(·)、gt(. and h)tThe (·) functions are all modeled by a multilayer perceptron (MLP); the ρ (-) function employs an edge feature aggregation method based on an attention mechanism. Computing multiclass probability distribution values for each vertex by classification branches
Figure BDA0002991784150000111
M is the total number of target classes, including the background class;
Figure BDA0002991784150000112
and
Figure BDA0002991784150000113
respectively carrying out one-hot coding on the prediction probability and the class label of the ith vertex; x is the number ofjIs the point cloud three-dimensional coordinate of the j point,
Figure BDA0002991784150000114
is a j-point feature of the t layer.
And calculating the predicted value and the Huber loss of the group route through a loss function. The classification loss adopts average cross entropy loss; positioning a boundary box of each class of branch prediction, and if one vertex is in one boundary box, calculating a predicted value and the Huber loss of the group route; if a vertex is not in the box or is a non-interesting class, the positioning loss is set to 0, and the specific formula is as follows:
Figure BDA0002991784150000115
Figure BDA0002991784150000116
for the target bounding box, we denote the center position of the bounding box in 7 degrees of freedom format b ═ x, y, z, l, h, w, θ, (x, y, z) denotes the length, height and width of the box, respectively, (l, h, w) denotes the yaw angle.
Figure BDA0002991784150000117
Is one-hot encoded for the true class label at point i,
Figure BDA0002991784150000118
for class prediction probability coding of i points, we use vertex coordinates (x)v,yv,zv) Encoding the bounding box:
Figure BDA0002991784150000119
Figure BDA00029917841500001110
Figure BDA00029917841500001111
wherein lm,hm,wm,θ0,θmIs a constant scale factor, viFor the predicted three-dimensional coordinates of the vertex i, binterestFor the category real box area that needs to be located,
Figure BDA00029917841500001112
7-dimensional bounding-box coding for predicted vertices i,/huberIs the Huber loss function, δgtThe 7-dimensional bounding box of the true category label is encoded. In this example will (l)m,hm,wm) Setting the median of a bounding box of a class to be trained, and setting theta to be equal to [ pi/4, 3 pi/4 ∈],θ0=π/2,θmPi/2 to ensure that the object in front of the detection sight line is in the detection range. A localization box branching network employs MLP to predict bounding box coding δ for each classb=(δx,δy,δz,δl,δh,δw,δθ)。
To sum up, the laser radar target detection method and system fusing the semantic information provided by the embodiment of the application have the following technical effects:
the method has the advantages that the point cloud data are processed by adopting semantic segmentation and image convolution, the semantic segmentation adopts a coder-decoder structure, the high-level semantic information is obtained while the contour information is kept, the point cloud data characteristics are extracted through the image convolution structure, the state of the point is updated according to the relative coordinate coding of the adjacent points and the central point characteristics, the structural characteristics of the space point are well represented, the detection accuracy is improved, the method for extracting the visual laser fusion target detection based on the image semantic segmentation and the image convolution characteristics is further achieved, and the technical effects of accuracy and high efficiency of the road vehicle and pedestrian target detection are remarkably improved.
Example two
Based on the same inventive concept as the laser radar target detection method fusing the semantic information in the foregoing embodiment, the present invention further provides a laser radar target detection system fusing the semantic information, as shown in fig. 2, the system includes:
a first obtaining unit 11, where the first obtaining unit 11 is configured to perform semantic segmentation processing on the image frame under each timestamp to obtain a semantic segmentation score;
a first adding unit 12, wherein the first adding unit 12 is used for adding image RGB features under corresponding camera coordinates in each frame of point cloud data;
a first projection unit 13, the first projection unit 13 being configured to project the point cloud data to which the image RGB features are added into an output of a segmentation network, and to attach the semantic segmentation score to the point cloud data;
a second obtaining unit 14, where the second obtaining unit 14 is configured to perform graph convolution-based target classification and 3D frame regression on the point cloud data to which the semantic segmentation score and the image RGB features are added, so as to obtain a target position frame and a target category.
Further, the system further comprises:
a first extraction unit, configured to perform downsampling feature extraction on the image frames at each timestamp through an encoder with the ResNet101 as a main network;
a third obtaining unit, configured to perform resolution recovery processing of upsampling on the image frame under each timestamp through a decoder, so as to obtain a prediction feature map;
a fourth obtaining unit, configured to obtain the semantic segmentation score according to the predicted feature map.
Further, the system further comprises:
the first conversion unit is used for converting the spatial point cloud position into a coordinate point position under a camera coordinate system according to a conversion matrix from a point cloud coordinate system to a camera coordinate system for each frame of point cloud data;
a fifth obtaining unit, configured to screen points, of which Z-axis coordinate values are greater than 0.1, from among the camera coordinate points, to obtain a first index position set;
a sixth obtaining unit configured to obtain coordinate values in an image coordinate system through a conversion matrix from a camera coordinate system to the image coordinate system according to the first index position set;
a seventh obtaining unit, configured to load an image frame corresponding to the point cloud data, and obtain RGB channel data under each coordinate value;
a first cascading unit to cascade the RGB channel data to a point cloud feature dimension.
Further, the system further comprises:
the first indexing unit is used for indexing image coordinates of each point with the semantic segmentation scores output by the segmentation network for the image frames under each timestamp;
a first superimposing unit for superimposing the corresponding category into the point cloud data that has been projected into the image coordinate system in the corresponding frame.
Further, the system further comprises:
a first construction unit for downsampling-based graph construction of the point cloud data;
the first improvement unit is used for constructing the characteristics of each central point of the neural network updating iteration of the graph, and improving the state of the central point through the states of adjacent points;
a first prediction unit for locating the bounding box of the branch prediction for each class, and if a vertex is in a bounding box, calculating the Huber loss of the prediction value and the grountruth; if a vertex is not in the bounding box or is a non-interesting class, its position penalty is set to 0.
Various changes and specific examples of the laser radar target detection method with fusion of semantic information in the first embodiment of fig. 1 are also applicable to the laser radar target detection system with fusion of semantic information in the present embodiment, and through the foregoing detailed description of the laser radar target detection method with fusion of semantic information, those skilled in the art can clearly know the implementation method of the laser radar target detection system with fusion of semantic information in the present embodiment, so for the sake of brevity of the description, detailed description is not repeated here.
Exemplary electronic device
The electronic device of the embodiment of the present application is described below with reference to fig. 3.
Fig. 3 illustrates a schematic structural diagram of an electronic device according to an embodiment of the present application.
Based on the inventive concept of the laser radar target detection method with fusion of semantic information in the foregoing embodiments, the present invention further provides a laser radar target detection system with fusion of semantic information, on which a computer program is stored, and when the computer program is executed by a processor, the steps of any one of the foregoing laser radar target detection methods with fusion of semantic information are implemented.
Where in fig. 3 a bus architecture (represented by bus 300), bus 300 may include any number of interconnected buses and bridges, bus 300 linking together various circuits including one or more processors, represented by processor 302, and memory, represented by memory 304. The bus 300 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 305 provides an interface between the bus 300 and the receiver 301 and transmitter 303. The receiver 301 and the transmitter 303 may be the same element, i.e., a transceiver, providing a means for communicating with various other systems over a transmission medium.
The processor 302 is responsible for managing the bus 300 and general processing, and the memory 304 may be used for storing data used by the processor 302 in performing operations.
The embodiment of the invention provides a laser radar target detection method fusing semantic information, which comprises the following steps: performing semantic segmentation processing on the image frame under each timestamp to obtain a semantic segmentation score; adding image RGB characteristics under the corresponding camera coordinates in each frame of point cloud data; projecting the point cloud data with the image RGB features added to the output of a segmentation network and appending the semantic segmentation score to the point cloud data; and carrying out target classification and 3D frame regression based on graph convolution on the point cloud data added with the semantic segmentation scores and the image RGB features to obtain a target position frame and a target category. The method solves the technical problems that the detection effect of the target features is not accurate enough, and then the target detection of vehicles and pedestrians is not accurate enough and efficient in the prior art, further achieves the technical effect of extracting the visual laser fusion target detection method based on image semantic segmentation and graph convolution features, and obviously improves the accuracy and the efficiency of the target detection of the vehicles and the pedestrians on the road.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create a system for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including an instruction system which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (9)

1. A laser radar target detection method fusing semantic information, wherein the method comprises the following steps:
performing semantic segmentation processing on the image frame under each timestamp to obtain a semantic segmentation score;
adding image RGB characteristics under the corresponding camera coordinates in each frame of point cloud data;
projecting the point cloud data with the image RGB features added to the output of a segmentation network and appending the semantic segmentation score to the point cloud data;
and carrying out target classification and 3D frame regression based on graph convolution on the point cloud data added with the semantic segmentation scores and the image RGB features to obtain a target position frame and a target category.
2. The method as claimed in claim 1, wherein the semantic segmentation processing is performed on the image frame at each timestamp to obtain a semantic segmentation score, including;
taking ResNet101 as a main network, and performing downsampling feature extraction on the image frames under each timestamp through an encoder;
performing resolution recovery processing of up-sampling on the image frame under each timestamp through a decoder to obtain a prediction characteristic map;
and obtaining the semantic segmentation score according to the prediction feature map.
3. The method of claim 1, wherein the adding of image RGB features in respective camera coordinates to each frame of point cloud data comprises:
for each frame of point cloud data, converting the spatial point cloud position to a coordinate point position under a camera coordinate system according to a conversion matrix from the point cloud coordinate system to the camera coordinate system;
screening points of which the Z-axis coordinate value is greater than 0.1 in all camera coordinate points to obtain a first index position set;
obtaining coordinate values under the image coordinate system through a conversion matrix from the camera coordinate system to the image coordinate system according to the first index position set;
loading an image frame corresponding to the point cloud data to obtain RGB channel data under each coordinate value;
cascading the RGB channel data to a point cloud feature dimension.
4. The method of claim 3, wherein the projecting the point cloud data with the image RGB features added to the point cloud data into an output of a segmentation network and appending the semantic segmentation score to the point cloud data comprises:
for the image frame under each timestamp, indexing the image coordinates of each point with the semantic segmentation score output by the segmentation network;
superimposing the corresponding category into the point cloud data in the respective frame that has been projected into the image coordinate system.
5. The method of claim 1, wherein the performing graph convolution-based target classification and 3D frame regression on the point cloud data with the additional semantic segmentation scores and the image RGB features to obtain a target location frame and a target class comprises:
performing downsampling-based graph construction on the point cloud data;
constructing a graph neural network to update and iterate the characteristics of each central point, and improving the state of the central point through the states of adjacent points;
positioning the boundary box of each category of branch prediction, and if one vertex is in one boundary box, calculating a predicted value and the Huber loss of the group route; if a vertex is not in the bounding box or is a non-interesting class, its position penalty is set to 0.
6. The method of claim 5, wherein the constructed graph neural network updates the features of each central point by iterating through the states of neighboring points to improve the state of the central point by the formula:
Figure FDA0002991784140000031
Figure FDA0002991784140000032
wherein, a point cloud picture containing N points is defined as P ═ { P ═ P1,...,pNIn which p isi=(xi,si),xi∈R3Spatial coordinates (X, Y, Z), s representing the original point cloudi∈RkK-dimensional vector representing the state of the original point attribute, ft(·)、gt(. and h)tThe functions are all modeled by a multilayer perceptron (MLP), M represents the total number of target classes, xjIs the point cloud three-dimensional coordinate of the j point,
Figure FDA00029917841400000310
is a j-point feature of the t layer.
7. The method of claim 5, wherein the positioning branch predicts bounding boxes for each class, and computes the Huber penalty for a predictor and group branch if a vertex is in a bounding box; if a vertex is not in the bounding box or is a non-interesting class, its positioning penalty is set to 0, and the specific formula is as follows:
Figure FDA0002991784140000033
Figure FDA0002991784140000034
Figure FDA0002991784140000035
to know
Figure FDA0002991784140000036
Respectively, the prediction probability and class label one-hot coding of the ith vertex, b is a freedom degree format,
Figure FDA0002991784140000037
is one-hot encoded for the true class label at point i,
Figure FDA0002991784140000038
predicting the probability coding for the class of i points, viFor the predicted three-dimensional coordinates of the vertex i, binterestFor the category real box area that needs to be located,
Figure FDA0002991784140000039
7-dimensional bounding-box coding for predicted vertices i,/huberIs the Huber loss function, δgtThe 7-dimensional bounding box of the true category label is encoded.
8. A lidar target detection system that fuses semantic information, wherein the system comprises:
the first obtaining unit is used for performing semantic segmentation processing on the image frame under each timestamp to obtain a semantic segmentation score;
the first adding unit is used for adding image RGB characteristics under corresponding camera coordinates in each frame of point cloud data;
a first projection unit for projecting the point cloud data to which the image RGB features are added into an output of a segmentation network and attaching the semantic segmentation score to the point cloud data;
and the second obtaining unit is used for carrying out target classification and 3D frame regression based on graph convolution on the point cloud data added with the semantic segmentation scores and the image RGB features to obtain a target position frame and a target category.
9. A lidar target detection system incorporating semantic information, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any of claims 1-7 when executing the program.
CN202110317542.1A 2021-03-25 2021-03-25 Laser radar target detection method and system integrating semantic information Active CN113128348B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110317542.1A CN113128348B (en) 2021-03-25 2021-03-25 Laser radar target detection method and system integrating semantic information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110317542.1A CN113128348B (en) 2021-03-25 2021-03-25 Laser radar target detection method and system integrating semantic information

Publications (2)

Publication Number Publication Date
CN113128348A true CN113128348A (en) 2021-07-16
CN113128348B CN113128348B (en) 2023-11-24

Family

ID=76773893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110317542.1A Active CN113128348B (en) 2021-03-25 2021-03-25 Laser radar target detection method and system integrating semantic information

Country Status (1)

Country Link
CN (1) CN113128348B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113658257A (en) * 2021-08-17 2021-11-16 广州文远知行科技有限公司 Unmanned equipment positioning method, device, equipment and storage medium
CN113705631A (en) * 2021-08-10 2021-11-26 重庆邮电大学 3D point cloud target detection method based on graph convolution
CN113963044A (en) * 2021-09-30 2022-01-21 北京工业大学 RGBD camera-based intelligent loading method and system for cargo box
CN113984037A (en) * 2021-09-30 2022-01-28 电子科技大学长三角研究院(湖州) Semantic map construction method based on target candidate box in any direction
CN114140765A (en) * 2021-11-12 2022-03-04 北京航空航天大学 Obstacle sensing method and device and storage medium
CN114359902A (en) * 2021-12-03 2022-04-15 武汉大学 Three-dimensional point cloud semantic segmentation method based on multi-scale feature fusion
CN114429631A (en) * 2022-01-27 2022-05-03 北京百度网讯科技有限公司 Three-dimensional object detection method, device, equipment and storage medium
CN114445802A (en) * 2022-01-29 2022-05-06 北京百度网讯科技有限公司 Point cloud processing method and device and vehicle
CN114998890A (en) * 2022-05-27 2022-09-02 长春大学 Three-dimensional point cloud target detection algorithm based on graph neural network
CN115272493A (en) * 2022-09-20 2022-11-01 之江实验室 Abnormal target detection method and device based on continuous time sequence point cloud superposition
CN116265862A (en) * 2021-12-16 2023-06-20 动态Ad有限责任公司 Vehicle, system and method for a vehicle, and storage medium
CN117058380A (en) * 2023-08-15 2023-11-14 北京学图灵教育科技有限公司 Multi-scale lightweight three-dimensional point cloud segmentation method and device based on self-attention
CN117333676A (en) * 2023-12-01 2024-01-02 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Point cloud feature extraction method and point cloud visual detection method based on graph expression
CN117994504A (en) * 2024-04-03 2024-05-07 国网江苏省电力有限公司常州供电分公司 Target detection method and target detection device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109948661A (en) * 2019-02-27 2019-06-28 江苏大学 A kind of 3D vehicle checking method based on Multi-sensor Fusion
CN111027401A (en) * 2019-11-15 2020-04-17 电子科技大学 End-to-end target detection method with integration of camera and laser radar
CN111583337A (en) * 2020-04-25 2020-08-25 华南理工大学 Omnibearing obstacle detection method based on multi-sensor fusion
CN111709343A (en) * 2020-06-09 2020-09-25 广州文远知行科技有限公司 Point cloud detection method and device, computer equipment and storage medium
US10929694B1 (en) * 2020-01-22 2021-02-23 Tsinghua University Lane detection method and system based on vision and lidar multi-level fusion

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109948661A (en) * 2019-02-27 2019-06-28 江苏大学 A kind of 3D vehicle checking method based on Multi-sensor Fusion
CN111027401A (en) * 2019-11-15 2020-04-17 电子科技大学 End-to-end target detection method with integration of camera and laser radar
US10929694B1 (en) * 2020-01-22 2021-02-23 Tsinghua University Lane detection method and system based on vision and lidar multi-level fusion
CN111583337A (en) * 2020-04-25 2020-08-25 华南理工大学 Omnibearing obstacle detection method based on multi-sensor fusion
CN111709343A (en) * 2020-06-09 2020-09-25 广州文远知行科技有限公司 Point cloud detection method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
谢波;赵亚男;高利;高峰;: "基于激光雷达点云的小目标语义分割增强方法", 激光杂志, no. 04 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705631B (en) * 2021-08-10 2024-01-23 大庆瑞昂环保科技有限公司 3D point cloud target detection method based on graph convolution
CN113705631A (en) * 2021-08-10 2021-11-26 重庆邮电大学 3D point cloud target detection method based on graph convolution
CN113658257A (en) * 2021-08-17 2021-11-16 广州文远知行科技有限公司 Unmanned equipment positioning method, device, equipment and storage medium
CN113658257B (en) * 2021-08-17 2022-05-27 广州文远知行科技有限公司 Unmanned equipment positioning method, device, equipment and storage medium
CN113984037B (en) * 2021-09-30 2023-09-12 电子科技大学长三角研究院(湖州) Semantic map construction method based on target candidate frame in any direction
CN113963044A (en) * 2021-09-30 2022-01-21 北京工业大学 RGBD camera-based intelligent loading method and system for cargo box
CN113963044B (en) * 2021-09-30 2024-04-30 北京工业大学 Cargo box intelligent loading method and system based on RGBD camera
CN113984037A (en) * 2021-09-30 2022-01-28 电子科技大学长三角研究院(湖州) Semantic map construction method based on target candidate box in any direction
CN114140765B (en) * 2021-11-12 2022-06-24 北京航空航天大学 Obstacle sensing method and device and storage medium
CN114140765A (en) * 2021-11-12 2022-03-04 北京航空航天大学 Obstacle sensing method and device and storage medium
CN114359902A (en) * 2021-12-03 2022-04-15 武汉大学 Three-dimensional point cloud semantic segmentation method based on multi-scale feature fusion
CN114359902B (en) * 2021-12-03 2024-04-26 武汉大学 Three-dimensional point cloud semantic segmentation method based on multi-scale feature fusion
CN116265862A (en) * 2021-12-16 2023-06-20 动态Ad有限责任公司 Vehicle, system and method for a vehicle, and storage medium
CN114429631A (en) * 2022-01-27 2022-05-03 北京百度网讯科技有限公司 Three-dimensional object detection method, device, equipment and storage medium
CN114429631B (en) * 2022-01-27 2023-11-14 北京百度网讯科技有限公司 Three-dimensional object detection method, device, equipment and storage medium
CN114445802A (en) * 2022-01-29 2022-05-06 北京百度网讯科技有限公司 Point cloud processing method and device and vehicle
CN114998890A (en) * 2022-05-27 2022-09-02 长春大学 Three-dimensional point cloud target detection algorithm based on graph neural network
CN114998890B (en) * 2022-05-27 2023-03-10 长春大学 Three-dimensional point cloud target detection algorithm based on graph neural network
CN115272493B (en) * 2022-09-20 2022-12-27 之江实验室 Abnormal target detection method and device based on continuous time sequence point cloud superposition
CN115272493A (en) * 2022-09-20 2022-11-01 之江实验室 Abnormal target detection method and device based on continuous time sequence point cloud superposition
CN117058380B (en) * 2023-08-15 2024-03-26 北京学图灵教育科技有限公司 Multi-scale lightweight three-dimensional point cloud segmentation method and device based on self-attention
CN117058380A (en) * 2023-08-15 2023-11-14 北京学图灵教育科技有限公司 Multi-scale lightweight three-dimensional point cloud segmentation method and device based on self-attention
CN117333676A (en) * 2023-12-01 2024-01-02 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Point cloud feature extraction method and point cloud visual detection method based on graph expression
CN117333676B (en) * 2023-12-01 2024-04-02 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Point cloud feature extraction method and point cloud visual detection method based on graph expression
CN117994504A (en) * 2024-04-03 2024-05-07 国网江苏省电力有限公司常州供电分公司 Target detection method and target detection device

Also Published As

Publication number Publication date
CN113128348B (en) 2023-11-24

Similar Documents

Publication Publication Date Title
CN113128348B (en) Laser radar target detection method and system integrating semantic information
Zamanakos et al. A comprehensive survey of LIDAR-based 3D object detection methods with deep learning for autonomous driving
US10078790B2 (en) Systems for generating parking maps and methods thereof
Wen et al. Deep learning-based perception systems for autonomous driving: A comprehensive survey
US20120263346A1 (en) Video-based detection of multiple object types under varying poses
Liang et al. A survey of 3D object detection
CN114821507A (en) Multi-sensor fusion vehicle-road cooperative sensing method for automatic driving
Xu et al. HA U-Net: Improved model for building extraction from high resolution remote sensing imagery
CN112270694B (en) Method for detecting urban environment dynamic target based on laser radar scanning pattern
CN116129233A (en) Automatic driving scene panoramic segmentation method based on multi-mode fusion perception
Park et al. Drivable dirt road region identification using image and point cloud semantic segmentation fusion
Ngo et al. Cooperative perception with V2V communication for autonomous vehicles
CN116486368A (en) Multi-mode fusion three-dimensional target robust detection method based on automatic driving scene
CN112613392A (en) Lane line detection method, device and system based on semantic segmentation and storage medium
CN115965970A (en) Method and system for realizing bird's-eye view semantic segmentation based on implicit set prediction
CN115100741A (en) Point cloud pedestrian distance risk detection method, system, equipment and medium
Chen et al. Multitarget vehicle tracking and motion state estimation using a novel driving environment perception system of intelligent vehicles
CN114118247A (en) Anchor-frame-free 3D target detection method based on multi-sensor fusion
Gomez-Donoso et al. Three-dimensional reconstruction using SFM for actual pedestrian classification
Huang et al. Overview of LiDAR point cloud target detection methods based on deep learning
AU2023203583A1 (en) Method for training neural network model and method for generating image
CN116664851A (en) Automatic driving data extraction method based on artificial intelligence
CN114820931B (en) Virtual reality-based CIM (common information model) visual real-time imaging method for smart city
CN116453205A (en) Method, device and system for identifying stay behavior of commercial vehicle
Yu et al. YOLOv5-Based Dense Small Target Detection Algorithm for Aerial Images Using DIOU-NMS.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant