CN114140765A - Obstacle sensing method and device and storage medium - Google Patents

Obstacle sensing method and device and storage medium Download PDF

Info

Publication number
CN114140765A
CN114140765A CN202111338928.7A CN202111338928A CN114140765A CN 114140765 A CN114140765 A CN 114140765A CN 202111338928 A CN202111338928 A CN 202111338928A CN 114140765 A CN114140765 A CN 114140765A
Authority
CN
China
Prior art keywords
point cloud
semantic
picture
information
camera
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111338928.7A
Other languages
Chinese (zh)
Other versions
CN114140765B (en
Inventor
吴新开
徐少清
王鹏成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202111338928.7A priority Critical patent/CN114140765B/en
Publication of CN114140765A publication Critical patent/CN114140765A/en
Application granted granted Critical
Publication of CN114140765B publication Critical patent/CN114140765B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/88Lidar systems specially adapted for specific applications
    • G01S17/93Lidar systems specially adapted for specific applications for anti-collision purposes
    • G01S17/931Lidar systems specially adapted for specific applications for anti-collision purposes of land vehicles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • G06T2207/10044Radar image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Abstract

The application discloses a method and a device for sensing obstacles and a storage medium, which are used for reducing the false detection and the false detection rate of the obstacles and improving the detection precision. The obstacle sensing method disclosed by the application comprises the following steps: acquiring an original point cloud and a camera picture at the same time; acquiring a calibration internal parameter and a calibration external parameter of projection conversion; performing semantic segmentation on the original point cloud to obtain a second point cloud; performing semantic segmentation on the camera picture to obtain a second picture; according to the internal parameters and the external parameters, projecting the original point cloud to the second picture to obtain a third point cloud, wherein each point in the third point cloud comprises second semantic category information corresponding to the second picture; after the second semantic category information in the third point cloud, the first semantic category information in the second point cloud and the feature information of the original point cloud are subjected to voxelization, the second semantic category information and the feature information are input into a self-adaptive attention mechanism network for learning, and the weighted semantic information is obtained; and detecting the obstacle target according to the weighted semantic information. The application also provides an obstacle sensing device and a storage medium.

Description

Obstacle sensing method and device and storage medium
Technical Field
The present application relates to the field of automatic driving, and in particular, to a method and an apparatus for sensing an obstacle, and a storage medium.
Background
With the continuous development of automatic driving technology, various sensors are used as important components of an automatic driving system. The environment sensing portion of an autopilot system typically requires the acquisition of a large amount of ambient information to ensure proper understanding and corresponding decision-making of the automotive vehicle's body ambient environment. However, the sensing by using a single sensor has limitation, on one hand, the single sensing equipment may have a detection blind area due to the limitation of the installation position of the sensor; on the other hand each sensor has its own characteristic defect.
Therefore, the obstacle sensing is carried out by using a single sensor, and the problem of low identification precision exists.
Disclosure of Invention
In view of the foregoing technical problems, embodiments of the present application provide a method, an apparatus, and a storage medium for obstacle sensing, so as to improve accuracy of obstacle sensing.
In a first aspect, an obstacle sensing method provided in an embodiment of the present application includes:
acquiring an original point cloud and a camera picture at the same time;
acquiring a calibration internal parameter and a calibration external parameter of projection conversion;
performing semantic segmentation on the original point cloud to obtain a second point cloud;
performing semantic segmentation on the camera picture to obtain a second picture;
according to the internal parameters and the external parameters, projecting the original point cloud to the second picture to obtain a third point cloud, wherein each point in the third point cloud comprises second semantic category information corresponding to the second picture;
after the second semantic category information in the third point cloud, the first semantic category information in the second point cloud and the feature information of the original point cloud are subjected to voxelization, the second semantic category information and the feature information are input into a self-adaptive attention mechanism network for learning, and the weighted semantic information is obtained;
detecting an obstacle target according to the weighted semantic information;
wherein the second point cloud and the third point cloud include obstacle category information.
Preferably, the learning in the input adaptive attention mechanism network includes:
the local features are learned in the adaptive attention mechanism network to obtain learned local features Vi
The global feature is learned in the adaptive attention mechanism network to obtain a learned global feature Vglobal
The global feature V after learning is processedglobalSpliced to each local feature ViTo obtain an enhanced feature Vgl
Preferably, the detecting an obstacle target according to the first semantic information includes:
and inputting the first semantic information into a target detector to detect the obstacle target.
Preferably, the acquiring the original point cloud and the camera picture at the same time includes:
performing software synchronization or hardware synchronization on the point cloud and the camera;
and obtaining the original point cloud and the camera picture at the same moment.
Preferably, the semantic segmentation of the original point cloud to obtain the second point cloud comprises:
and inputting the original point cloud into a point cloud semantic segmentation network to obtain a second point cloud.
Preferably, the semantic segmentation of the camera picture to obtain the second picture includes:
and inputting the camera picture into a picture semantic segmentation network to obtain a second picture.
Before the performing voxelization on the second semantic category information in the third point cloud, the first semantic category information in the second point cloud and the feature information of the original point cloud, the method further comprises the following steps:
and converting the first semantic category information and the second semantic category information into a One-Hot coding format.
Preferably, the projecting the original point cloud to the second picture includes:
projection is performed according to the following formula:
P′=Proj(K,M,P),
wherein, Proj is a projection matrix processing process;
k is an internal reference matrix of the camera;
m is an external parameter matrix from the camera to the laser radar;
p is a laser radar point cloud set;
and P' is the laser radar point cloud projected to the camera coordinate system.
The local features are learned in a self-adaptive attention mechanism network to obtain learned local features ViThe method comprises the following steps:
learning of local features is performed according to the following formula:
Vi=maxi=1,2,…,N{MLPl(pi)},
wherein, ViFeatures in the ith voxel grid obtained by learning;
piis the ith point in the space point cloud;
MLPl(pi) A multi-layer perceptron for local features;
max is the maximum pooling operation for all points in a voxel;
C1the number of channels that are local feature maps;
n is the number of voxel bins.
Preferably, the global feature is learned in an adaptive attention mechanism network to obtain a learned global feature VglobalThe method comprises the following steps:
the global features are learned according to the following formula:
Vglobal=maxi=1,2,…,N{MLPg(Vi)}
MLPg(Vi) A global feature multi-layer perceptron;
max is the maximum pooling operation for all voxels;
C2the number of channels for the entire profile;
n is the number of voxel grids;
Vifeatures in the ith voxel grid obtained by learning are used.
The obtaining of the weighted semantic information includes:
obtaining weighted semantic information according to the following formula:
Figure BDA0003351109850000041
Figure BDA0003351109850000042
wherein, Pa,sAnd Pa,tThe weighted semantic information is obtained;
P2Dthe second semantic information is obtained;
P3Dthe first semantic information is obtained;
MLPattis a multilayer perceptron;
σ is Sigmoid activation function.
By using the obstacle sensing method provided by the invention, the laser radar sensor and the camera sensor are fused, the advantages of different sensors are utilized, the defects of the respective sensors are also supplemented, and the sensing and identifying precision of point cloud target detection is improved. Meanwhile, in the scheme, the false detection and the false detection rate of the barrier are reduced by utilizing the deep learning network combining the three-dimensional point cloud semantic segmentation information and the two-dimensional picture semantic segmentation information and utilizing the semantic information of different sensors.
In a second aspect, an embodiment of the present application further provides an obstacle sensing device, including:
the image acquisition module is configured for acquiring a camera image and acquiring calibration internal parameters and external parameters of projection conversion;
a point cloud acquisition module configured to acquire an original point cloud;
the picture semantic segmentation module is configured for performing semantic segmentation on the camera picture to obtain a second picture;
the point cloud semantic segmentation module is configured for performing semantic segmentation on the original point cloud to obtain a second point cloud;
the image semantic projection module is configured to perform projection from the original point cloud to the second image according to the internal parameters and the external parameters to obtain a third point cloud, wherein each point in the third point cloud comprises second semantic category information corresponding to the second image;
the semantic fusion module is configured to perform voxelization on second semantic category information in the third point cloud, first semantic category information in the second point cloud and feature information of the original point cloud, and input the voxelization information into an adaptive attention mechanism network for learning to obtain weighted semantic information;
an obstacle sensing module configured to detect an obstacle target according to the weighted semantic information;
wherein the second point cloud and the second picture include obstacle category information.
In a third aspect, an embodiment of the present application further provides an obstacle sensing device, including: a memory, a processor, and a user interface;
the memory for storing a computer program;
the user interface is used for realizing interaction with a user;
the processor is used for reading the computer program in the memory, and when the processor executes the computer program, the obstacle sensing method provided by the invention is realized.
In a fourth aspect, an embodiment of the present invention further provides a processor-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the processor implements the obstacle sensing method provided by the present invention.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic diagram of obstacle sensing provided in an embodiment of the present application;
FIG. 2 is a schematic flow chart of an adaptive attention mechanism provided in an embodiment of the present application;
fig. 3 is a schematic structural diagram of an obstacle sensing device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of another obstacle sensing device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Some of the words that appear in the text are explained below:
1. the term "and/or" in the embodiments of the present invention describes an association relationship of associated objects, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
2. In the embodiments of the present application, the term "plurality" means two or more, and other terms are similar thereto.
3. One-Hot coding format, also known as One-bit efficient coding, is often used in classified prediction, and is usually presented in the form of binary vectors. Firstly, the class to which the object belongs is mapped into an integer value, and then the integer value is converted into binary code, namely the class dimension value is 1, and the remaining dimension values are 0.
4. Voxelization means that a three-dimensional point cloud is divided into grids of the same resolution size (e.g., 0.75m × 0.75), and placed into different voxel grids according to the difference of the spatial position of each point in the point cloud.
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the display sequence of the embodiment of the present application only represents the sequence of the embodiment, and does not represent the merits of the technical solutions provided by the embodiments.
Example one
Referring to fig. 1, a schematic diagram of an obstacle sensing method provided in an embodiment of the present application is shown in fig. 1, where the method includes steps S101 to S107:
s101, acquiring an original point cloud and a camera picture at the same moment;
in the embodiment of the invention, the original point cloud is a three-dimensional point cloud and can be obtained through a laser radar. The camera pictures are acquired by the cameras, and if a plurality of cameras are installed, the camera pictures of the plurality of cameras are acquired simultaneously. The acquisition time of the original point cloud is the same as that of the camera picture, namely the original point cloud and the camera picture at the same moment are acquired.
As a preferred example, in this step, the original point cloud and the camera picture may not be at the same time, but the difference between the time of acquiring the original point cloud and the time of acquiring the camera picture is within a predetermined time difference range, and the predetermined time difference range is determined, for example, the difference between the time of acquiring the original point cloud and the time of acquiring the camera picture is 0.001 second.
As a preferred example, in order to acquire the original point cloud and the camera parameters at the same time, the point cloud and the camera may be soft-synchronized or hardware-synchronized. Soft synchronization refers to providing a same time source for different sensors, and the sensors respectively stamp recorded data; the hardware synchronization refers to that a hardware trigger is used for triggering different sensors to record time directly through physical signals such as PPS time service.
S102, obtaining calibrated internal parameters and external parameters of projection conversion;
in this step, calibration internal parameters and external parameters of projection conversion corresponding to the camera and the laser radar are obtained.
In the embodiment of the invention, the internal parameters are calibrated internal parameters of the camera; the external parameters are external parameters of the camera and the laser radar; the internal parameters and the external parameters are used for projection conversion of the point cloud.
Preferably, internal references include, but are not limited to: distortion coefficient, focal length, pixel size, etc.; external parameters include, but are not limited to: rotation, translation matrices, etc.
It should be noted that, in the embodiment of the present invention, the internal reference and the external reference are calibrated and stored in advance.
S103, performing semantic segmentation on the original point cloud to obtain a second point cloud;
as a preferred example, the original point cloud is input into a point cloud semantic segmentation network.
In the step, a frame of point cloud is transmitted to a point cloud semantic segmentation network, and a second point cloud containing fine-grained semantic information is obtained, namely the second point cloud comprises the first semantic information.
It should be noted that the fine-grained semantic information means that the category information of each point is clear and is not interfered by other external conditions, for example, is not influenced by internal parameters and external parameters.
S104, performing semantic segmentation on the camera picture to obtain a second picture;
in this step, the camera picture is input into a picture semantic segmentation network to obtain a second picture.
It should be noted that, as a preferable example, in the above steps S103 and S104, the semantic division network may be a Cylinder3D network or the like.
S105, projecting the original point cloud to the second picture according to the internal parameters and the external parameters to obtain a third point cloud, wherein each point in the third point cloud comprises second semantic category information corresponding to the second picture;
in this step, the projection method of the original point cloud to the second picture may be:
projection is performed according to the following formula:
P′=Proj(K,M,P),
wherein, Proj is a projection matrix processing process;
k is an internal reference matrix of the camera;
m is an external parameter matrix from the camera to the laser radar;
p is a laser radar point cloud set;
and P' is the laser radar point cloud projected to the camera coordinate system.
For example, the original point cloud is P, the second picture is K, and the projection is performed according to the above formula 1 to obtain a third point cloud P'.
After the above steps S101 to S105, each point in the point cloud has two semantic information, that is, the first semantic information from the original point cloud and the second semantic information from the point cloud projected picture. It should be noted that, as a preferred example, the obtained first semantic information and the second semantic information may also be in a One-Hot format.
S106, after the second semantic category information in the third point cloud, the first semantic category information in the second point cloud and the feature information of the original point cloud are subjected to voxelization, inputting the second semantic category information and the feature information into a self-adaptive attention mechanism network for learning, and obtaining weighted semantic information;
in this step, the learning in the input adaptive attention mechanism network includes:
adaptive attention machine for local featuresLearning in the system network to obtain the local feature V after learningi
The global feature is learned in the adaptive attention mechanism network to obtain a learned global feature Vglobal
The global feature V after learning is processedglobalSpliced to each local feature ViTo obtain an enhanced feature Vgl
As a preferred example, the local feature is learned in an attention mechanism network, and the learned local feature V is obtainediThe method comprises the following steps:
learning of local features is performed according to the following formula:
Vi=maxi=1,2,…,N{MLPl(pi)},
wherein, ViFeatures in the ith voxel grid obtained by learning;
piis the ith point in the space point cloud;
MLPl(pi) A multi-layer perceptron for local features;
max is the maximum pooling operation for all points in a voxel;
C1the number of channels that are local feature maps;
n is the number of voxel bins.
The global feature is learned in a self-adaptive attention mechanism network to obtain a learned global feature VglobalThe method comprises the following steps:
the global features are learned according to the following formula:
Vglobal=maxi=1,2,…,N{MLPg(Vi)}
MLPg(Vi) A global feature multi-layer perceptron;
max is the maximum pooling operation for all voxels;
C2the number of channels for the entire profile;
n is the number of voxel grids;
Vifeatures in the ith voxel grid obtained by learning are used.
As a preferred example, the weighted semantic information is obtained according to the following formula:
Figure BDA0003351109850000091
Figure BDA0003351109850000092
wherein, Pa,sAnd Pa,tThe weighted semantic information is obtained;
P2Dthe second semantic information is obtained;
P3Dthe first semantic information is obtained;
MLPattis a multilayer perceptron;
σ is Sigmoid activation function.
As a preferred example, the process of this step is shown in FIG. 2. In fig. 2, the input point cloud is an original point cloud, the 2D semantic information is data obtained by converting the second semantic information into an One-Hot format, and the 3D semantic information is data obtained by converting the first semantic information into an One-Hot format. The treatment process is as follows:
after 2D and 3D semantic information converted into One-Hot are spliced to original point cloud feature information, if the number of types to be predicted is m, each point respectively comprises 2m of semantic type information segmented by 3D and 2D, and finally the semantic type information and the original data information of the point cloud are spliced into One block such as XYZ to obtain an N x (2m +3) -dimensional feature vector, the N x (2m +3) -dimensional feature vector is subjected to voxelization and then input into a self-adaptive attention mechanism network combining local features and global features, the type feature of each voxel grid after weighting is obtained, and finally the weighted type feature is input into a target detection network.
S107, detecting an obstacle target according to the weighted semantic information;
in this step, the weighted semantic information is input to a target detector to detect an obstacle target.
By using the obstacle sensing method provided by the invention, the laser radar sensor and the camera sensor are fused, the advantages of different sensors are utilized, the defects of the respective sensors are also supplemented, and the sensing and identifying precision of point cloud target detection is improved. Meanwhile, in the scheme, the false detection and the false detection rate of the barrier are reduced by utilizing the deep learning network combining the three-dimensional point cloud semantic segmentation information and the two-dimensional picture semantic segmentation information and utilizing the semantic information of different sensors.
Example two
Based on the same inventive concept, an embodiment of the present invention further provides an obstacle sensing device, as shown in fig. 3, the device includes:
a picture obtaining module 303, configured to obtain a camera picture, and obtain calibration internal parameters and external parameters of projection conversion;
a point cloud obtaining module 301 configured to obtain an original point cloud;
a picture semantic segmentation module 304, configured to perform semantic segmentation on the camera picture to obtain a second picture;
a point cloud semantic segmentation module 302 configured to perform semantic segmentation on the original point cloud to obtain a second point cloud;
a picture semantic projection module 305, configured to perform projection from the original point cloud to the second picture according to the internal parameters and the external parameters to obtain a third point cloud, where each point in the third point cloud includes second semantic category information corresponding to the second picture;
a semantic fusion module 306, configured to perform voxelization on second semantic category information in the third point cloud, the first semantic category information in the second point cloud and the feature information of the original point cloud, and input the voxelization into an adaptive attention mechanism network to perform learning, so as to obtain weighted semantic information;
an obstacle sensing module 307 configured to detect an obstacle target according to the weighted semantic information;
wherein the second point cloud and the second picture include obstacle category information.
As a preferred example, the picture acquiring module 303 is further configured to:
and acquiring a camera picture at the same time as the original point cloud.
Specifically, software synchronization or hardware synchronization may be performed on the point cloud and the camera, and then a camera picture at the same time as the original point cloud is obtained.
As a preferred example, the picture semantic segmentation module 304 is further configured to:
and inputting the camera picture into a picture semantic segmentation network to obtain a second picture.
As a preferred example, the point cloud semantic segmentation module 302 is further configured to:
and inputting the original point cloud into a point cloud semantic segmentation network to obtain a second point cloud.
As a preferred example, the picture semantic projection module 305 is further configured to perform the projection of the original point cloud to the second picture according to the following manner:
projection is performed according to the following formula:
P′=Proj(K,M,P),
wherein, Proj is a projection matrix processing process;
k is an internal reference matrix of the camera;
m is an external parameter matrix from the camera to the laser radar;
p is a laser radar point cloud set;
and P' is the laser radar point cloud projected to the camera coordinate system.
As a preferred example, the learning in the input adaptive attention mechanism network includes:
the local features are learned in the adaptive attention mechanism network to obtain learned local features Vi
The global feature is learned in the adaptive attention mechanism network to obtain a learned global feature Vglobal
The global feature V after learning is processedglobalIs spliced toEach local feature ViTo obtain an enhanced feature Vgl
The learning in the input adaptive attention mechanism network comprises:
the local features are learned in the adaptive attention mechanism network to obtain learned local features Vi
The global feature is learned in the adaptive attention mechanism network to obtain a learned global feature Vglobal
The global feature V after learning is processedglobalSpliced to each local feature ViTo obtain an enhanced feature Vgl
The global feature is learned in a self-adaptive attention mechanism network to obtain a learned global feature VglobalThe method comprises the following steps:
the global features are learned according to the following formula:
Vglobal=maxi=1,2,…,N{MLPg(Vi)}
MLPg(Vi) A global feature multi-layer perceptron;
max is the maximum pooling operation for all voxels;
C2the number of channels for the entire profile;
n is the number of voxel grids;
Vifeatures in the ith voxel grid obtained by learning are used.
As a preferred example, the semantic fusion module 306 is further configured to obtain weighted semantic information according to the following formula:
Figure BDA0003351109850000131
Figure BDA0003351109850000132
wherein, Pa,sAnd Pa,tThe weighted semantic information is obtained;
P2Dthe second semantic information is obtained;
P3Dthe first semantic information is obtained;
MLPattis a multilayer perceptron;
σ is Sigmoid activation function.
It should be noted that the apparatus provided in the second embodiment and the method provided in the first embodiment belong to the same inventive concept, solve the same technical problem, and achieve the same technical effect, and the apparatus provided in the second embodiment can implement all the methods of the first embodiment, and the same parts are not described again.
EXAMPLE III
Based on the same inventive concept, an embodiment of the present invention further provides an obstacle sensing device, as shown in fig. 4, the device includes:
including memory 402, processor 401, and user interface 403;
the memory 402 for storing a computer program;
the user interface 403 is used for realizing interaction with a user;
the processor 401 is configured to read the computer program in the memory 402, and when the processor 401 executes the computer program, the processor implements:
acquiring an original point cloud and a camera picture at the same time;
acquiring a calibration internal parameter and a calibration external parameter of projection conversion;
performing semantic segmentation on the original point cloud to obtain a second point cloud;
performing semantic segmentation on the camera picture to obtain a second picture;
according to the internal parameters and the external parameters, projecting the original point cloud to the second picture to obtain a third point cloud, wherein each point in the third point cloud comprises second semantic category information corresponding to the second picture;
after the second semantic category information in the third point cloud, the first semantic category information in the second point cloud and the feature information of the original point cloud are subjected to voxelization, the second semantic category information and the feature information are input into a self-adaptive attention mechanism network for learning, and the weighted semantic information is obtained;
detecting an obstacle target according to the weighted semantic information;
wherein the second point cloud and the third point cloud include obstacle category information.
Where in fig. 4 the bus architecture may include any number of interconnected buses and bridges, in particular one or more processors, represented by processor 401, and various circuits of memory, represented by memory 402, linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The processor 401 is responsible for managing the bus architecture and general processing, and the memory 402 may store data used by the processor 501 in performing operations.
The processor 401 may be a CPU, ASIC, FPGA or CPLD, and the processor 401 may also employ a multi-core architecture.
The processor 401, when executing the computer program stored in the memory 402, implements any of the obstacle sensing methods of the first embodiment.
It should be noted that the apparatus provided in the third embodiment and the method provided in the first embodiment belong to the same inventive concept, solve the same technical problem, and achieve the same technical effect, and the apparatus provided in the third embodiment can implement all the methods of the first embodiment, and the same parts are not described again.
The present application also proposes a processor-readable storage medium. The processor-readable storage medium stores a computer program, and the processor implements any obstacle sensing method in the first embodiment when executing the computer program.
It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (14)

1. An obstacle sensing method, comprising:
acquiring an original point cloud and a camera picture at the same time;
acquiring a calibration internal parameter and a calibration external parameter of projection conversion;
performing semantic segmentation on the original point cloud to obtain a second point cloud;
performing semantic segmentation on the camera picture to obtain a second picture;
according to the internal parameters and the external parameters, projecting the original point cloud to the second picture to obtain a third point cloud, wherein each point in the third point cloud comprises second semantic category information corresponding to the second picture;
after the second semantic category information in the third point cloud, the first semantic category information in the second point cloud and the feature information of the original point cloud are subjected to voxelization, the second semantic category information and the feature information are input into a self-adaptive attention mechanism network for learning, and the weighted semantic information is obtained;
detecting an obstacle target according to the weighted semantic information;
wherein the second point cloud and the third point cloud include obstacle category information.
2. The method of claim 1, wherein learning in the input adaptive attention mechanism network comprises:
the local features are learned in the adaptive attention mechanism network to obtain learned local features Vi
The global feature is learned in the adaptive attention mechanism network to obtain a learned global feature Vglobal
The global feature V after learning is processedglobalSpliced to each local feature ViTo obtain an enhanced feature Vgl
3. The method of claim 2, wherein detecting an obstacle target based on the weighted semantic information comprises:
and inputting the weighted semantic information into a target detector to detect the obstacle target.
4. The method of claim 1, wherein the obtaining the original point cloud and the camera picture at the same time comprises:
performing software synchronization or hardware synchronization on the point cloud and the camera;
and obtaining the original point cloud and the camera picture at the same moment.
5. The method of claim 1, wherein semantically segmenting the original point cloud to obtain a second point cloud comprises:
and inputting the original point cloud into a point cloud semantic segmentation network to obtain a second point cloud.
6. The method of claim 1, wherein the semantically segmenting the camera picture to obtain a second picture comprises:
and inputting the camera picture into a picture semantic segmentation network to obtain a second picture.
7. The method of claim 1, wherein before the voxelizing the second semantic category information in the third point cloud, the first semantic category information in the second point cloud, and the feature information of the original point cloud, further comprising:
and converting the first semantic category information and the second semantic category information into a One-Hot coding format.
8. The method of claim 1, wherein said projecting the original point cloud to the second picture comprises:
projection is performed according to the following formula:
P′=Proj(K,M,P),
wherein, Proj is a projection matrix processing process;
k is an internal reference matrix of the camera;
m is an external parameter matrix from the camera to the laser radar;
p is a laser radar point cloud set;
and P' is the laser radar point cloud projected to the camera coordinate system.
9. The method of claim 2, wherein the local features are learned in an adaptive attention mechanism network to obtain learned local features ViThe method comprises the following steps:
learning of local features is performed according to the following formula:
Vi=maxi=1,2,…,N{MLPl(pi)},
wherein, ViFeatures in the ith voxel grid obtained by learning;
piis the ith point in the space point cloud;
MLPl(pi) A multi-layer perceptron for local features;
max is the maximum pooling operation for all points in a voxel;
C1the number of channels that are local feature maps;
n is the number of voxel bins.
10. The method of claim 2, wherein the global feature is learned in an adaptive attention mechanism network to obtain a learned global feature VglobalThe method comprises the following steps:
the global features are learned according to the following formula:
Vglobal=maxi=1,2,…,N{MLPg(Vi)}
MLPg(Vi) A global feature multi-layer perceptron;
max is the maximum pooling operation for all voxels;
C2the number of channels for the entire profile;
n is the number of voxel grids;
Vifeatures in the ith voxel grid obtained by learning are used.
11. The method of claim 2, wherein the obtaining the weighted semantic information comprises:
obtaining weighted semantic information according to the following formula:
Figure FDA0003351109840000031
Figure FDA0003351109840000032
wherein, Pa,sAnd Pa,tThe weighted semantic information is obtained;
P2Dthe second semantic information is obtained;
P3Dthe first semantic information is obtained;
MLPattis a multilayer perceptron;
σ is Sigmoid activation function.
12. An obstacle sensing device, comprising:
the image acquisition module is configured for acquiring a camera image and acquiring calibration internal parameters and external parameters of projection conversion;
a point cloud acquisition module configured to acquire an original point cloud;
the picture semantic segmentation module is configured for performing semantic segmentation on the camera picture to obtain a second picture;
the point cloud semantic segmentation module is configured for performing semantic segmentation on the original point cloud to obtain a second point cloud;
the image semantic projection module is configured to perform projection from the original point cloud to the second image according to the internal parameters and the external parameters to obtain a third point cloud, wherein each point in the third point cloud comprises second semantic category information corresponding to the second image;
the semantic fusion module is configured to perform voxelization on second semantic category information in the third point cloud, first semantic category information in the second point cloud and feature information of the original point cloud, and input the voxelization information into an adaptive attention mechanism network for learning to obtain weighted semantic information;
an obstacle sensing module configured to detect an obstacle target according to the weighted semantic information;
wherein the second point cloud and the second picture include obstacle category information.
13. An obstacle sensing apparatus comprising a memory, a processor and a user interface;
the memory for storing a computer program;
the user interface is used for realizing interaction with a user;
the processor for reading a computer program in the memory, the processor, when executing the computer program, implementing the obstacle sensing method according to one of claims 1 to 11.
14. A processor-readable storage medium, characterized in that the processor-readable storage medium stores a computer program which, when executed by a processor, implements an obstacle sensing method according to one of claims 1 to 11.
CN202111338928.7A 2021-11-12 2021-11-12 Obstacle sensing method and device and storage medium Active CN114140765B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111338928.7A CN114140765B (en) 2021-11-12 2021-11-12 Obstacle sensing method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111338928.7A CN114140765B (en) 2021-11-12 2021-11-12 Obstacle sensing method and device and storage medium

Publications (2)

Publication Number Publication Date
CN114140765A true CN114140765A (en) 2022-03-04
CN114140765B CN114140765B (en) 2022-06-24

Family

ID=80393919

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111338928.7A Active CN114140765B (en) 2021-11-12 2021-11-12 Obstacle sensing method and device and storage medium

Country Status (1)

Country Link
CN (1) CN114140765B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116416586A (en) * 2022-12-19 2023-07-11 香港中文大学(深圳) Map element sensing method, terminal and storage medium based on RGB point cloud

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050015201A1 (en) * 2003-07-16 2005-01-20 Sarnoff Corporation Method and apparatus for detecting obstacles
US10609148B1 (en) * 2019-09-17 2020-03-31 Ha Q Tran Smart vehicle
CN111583337A (en) * 2020-04-25 2020-08-25 华南理工大学 Omnibearing obstacle detection method based on multi-sensor fusion
CN111709343A (en) * 2020-06-09 2020-09-25 广州文远知行科技有限公司 Point cloud detection method and device, computer equipment and storage medium
CN112101092A (en) * 2020-07-31 2020-12-18 北京智行者科技有限公司 Automatic driving environment sensing method and system
CN112419494A (en) * 2020-10-09 2021-02-26 腾讯科技(深圳)有限公司 Obstacle detection and marking method and device for automatic driving and storage medium
CN112560774A (en) * 2020-12-25 2021-03-26 广州文远知行科技有限公司 Obstacle position detection method, device, equipment and storage medium
CN113095172A (en) * 2021-03-29 2021-07-09 天津大学 Point cloud three-dimensional object detection method based on deep learning
CN113111887A (en) * 2021-04-26 2021-07-13 河海大学常州校区 Semantic segmentation method and system based on information fusion of camera and laser radar
CN113128348A (en) * 2021-03-25 2021-07-16 西安电子科技大学 Laser radar target detection method and system fusing semantic information

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050015201A1 (en) * 2003-07-16 2005-01-20 Sarnoff Corporation Method and apparatus for detecting obstacles
US10609148B1 (en) * 2019-09-17 2020-03-31 Ha Q Tran Smart vehicle
CN111583337A (en) * 2020-04-25 2020-08-25 华南理工大学 Omnibearing obstacle detection method based on multi-sensor fusion
CN111709343A (en) * 2020-06-09 2020-09-25 广州文远知行科技有限公司 Point cloud detection method and device, computer equipment and storage medium
CN112101092A (en) * 2020-07-31 2020-12-18 北京智行者科技有限公司 Automatic driving environment sensing method and system
CN112419494A (en) * 2020-10-09 2021-02-26 腾讯科技(深圳)有限公司 Obstacle detection and marking method and device for automatic driving and storage medium
CN112560774A (en) * 2020-12-25 2021-03-26 广州文远知行科技有限公司 Obstacle position detection method, device, equipment and storage medium
CN113128348A (en) * 2021-03-25 2021-07-16 西安电子科技大学 Laser radar target detection method and system fusing semantic information
CN113095172A (en) * 2021-03-29 2021-07-09 天津大学 Point cloud three-dimensional object detection method based on deep learning
CN113111887A (en) * 2021-04-26 2021-07-13 河海大学常州校区 Semantic segmentation method and system based on information fusion of camera and laser radar

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
BAI L 等: "Region-proposal Convolutional Network-driven Point Cloud Voxelization and Over-segmentation for 3D Object Detection", 《2019 CHINESE CONTROL AND DECISION CONFERENCE (CCDC)》 *
MO J W 等: "An obstacle-detecting algorithm based on image and 3D point cloud segmentation", 《2014 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRIAL APPLICATION》 *
莫建文 等: "结合图像分割和点云分割的障碍物检测算法", 《计算机工程与设计》 *
陈俊英 等: "互注意力融合图像和点云数据的3D目标检测", 《光学精密工程》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116416586A (en) * 2022-12-19 2023-07-11 香港中文大学(深圳) Map element sensing method, terminal and storage medium based on RGB point cloud
CN116416586B (en) * 2022-12-19 2024-04-02 香港中文大学(深圳) Map element sensing method, terminal and storage medium based on RGB point cloud

Also Published As

Publication number Publication date
CN114140765B (en) 2022-06-24

Similar Documents

Publication Publication Date Title
CN111201451B (en) Method and device for detecting object in scene based on laser data and radar data of scene
EP3516624B1 (en) A method and system for creating a virtual 3d model
CN112292711B (en) Associating LIDAR data and image data
EP3822910A1 (en) Depth image generation method and device
EP3620966A1 (en) Object detection method and apparatus for object detection
CN112991413A (en) Self-supervision depth estimation method and system
AU2017324923A1 (en) Predicting depth from image data using a statistical model
US20220108544A1 (en) Object detection apparatus, system and method
CN111985458B (en) Method for detecting multiple targets, electronic equipment and storage medium
dos Santos Rosa et al. Sparse-to-continuous: Enhancing monocular depth estimation using occupancy maps
CN112927279A (en) Image depth information generation method, device and storage medium
EP3965052A1 (en) Device and method of training a generative neural network
EP3953903A1 (en) Scale-aware monocular localization and mapping
CN114648758A (en) Object detection method and device, computer readable storage medium and unmanned vehicle
CN114140765B (en) Obstacle sensing method and device and storage medium
CN114998856B (en) 3D target detection method, device, equipment and medium for multi-camera image
US20220327730A1 (en) Method for training neural network, system for training neural network, and neural network
Cui et al. Dense depth-map estimation based on fusion of event camera and sparse LiDAR
CN112989877A (en) Method and device for labeling object in point cloud data
CN113439289A (en) Image processing for determining the thickness of an object
CN114997264A (en) Training data generation method, model training method, model detection method, device and electronic equipment
CN115546476A (en) Multi-object detection method and data platform based on multi-scale features
CN116188349A (en) Image processing method, device, electronic equipment and storage medium
CN113869440A (en) Image processing method, apparatus, device, medium, and program product
US20230342944A1 (en) System and Method for Motion Prediction in Autonomous Driving

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Wu Xinkai

Inventor after: Xu Shaoqing

Inventor after: Wang Pengcheng

Inventor before: Wu Xinkai

Inventor before: Xu Shaoqing

Inventor before: Wang Pengcheng

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant