CN112801059B - Graph convolution network system and 3D object detection method based on graph convolution network system - Google Patents

Graph convolution network system and 3D object detection method based on graph convolution network system Download PDF

Info

Publication number
CN112801059B
CN112801059B CN202110369721.XA CN202110369721A CN112801059B CN 112801059 B CN112801059 B CN 112801059B CN 202110369721 A CN202110369721 A CN 202110369721A CN 112801059 B CN112801059 B CN 112801059B
Authority
CN
China
Prior art keywords
graph convolution
semantic
module
proposal
network system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110369721.XA
Other languages
Chinese (zh)
Other versions
CN112801059A (en
Inventor
杨光远
黄瑾
张凯
丁冬睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lingxin Huizhi Shandong Intelligent Technology Co ltd
Original Assignee
Guangdong Zhongju Artificial Intelligence Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Zhongju Artificial Intelligence Technology Co ltd filed Critical Guangdong Zhongju Artificial Intelligence Technology Co ltd
Priority to CN202110369721.XA priority Critical patent/CN112801059B/en
Publication of CN112801059A publication Critical patent/CN112801059A/en
Application granted granted Critical
Publication of CN112801059B publication Critical patent/CN112801059B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a graph convolution network system and a 3D object detection method based on the graph convolution network system. The system comprises: the shape semantic extraction module is used for modeling the geometric position of the point cloud feature midpoint of the image; the multilayer perceptron is connected with the shape semantic extraction module and is used for extracting multilevel semantic features by utilizing a multilayer graph convolution neural network and filtering the multilevel semantic features by using an attention mechanism; the proposal generator is connected with the multilayer perceptron and used for summarizing the multi-level semantic features and generating a primary proposal in a weighting manner; and the proposal reasoning module is connected with the proposal generator and is used for predicting the 3D boundary box and the semantic category of the object in the image by utilizing the global semantic features and the primary proposal. The invention effectively gains the detection performance of the whole graph convolution network system, improves the precision of 3D object detection, and makes the interpretability of the depth network stronger.

Description

Graph convolution network system and 3D object detection method based on graph convolution network system
Technical Field
The embodiment of the invention relates to the field of computer vision, in particular to a graph volume network system and a 3D object detection method based on the graph volume network system.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
With the development of science and technology, people urgently need to use computer resources to sense the world and know the world, so that more convenience is provided for the life of people. Due to the existence of body organs such as eyes, noses, ears and the like, the human senses the world in a visual, olfactory, auditory and other ways, wherein visual information accounts for more than eighty percent of information acquired by the human outside. Just like the eyes are in the human body, the discipline of machine vision plays a very important role in the field of machine intelligence. Target detection is one of the traditional subjects of machine vision discipline, and especially, target detection under complex scenes is always the key research direction of researchers.
Object detection is a traditional task in the field of computer vision. Unlike image recognition, target detection not only needs to recognize an object existing on an image and give a corresponding category, but also needs to locate the object through a Bounding box (Bounding box). 2D object detection typically finds and classifies a variable number of objects in an RGB image and is indicated on the image using a 2D bounding box. Most of the research at present focuses on 2D object prediction, and by extending the prediction to 3D, people can capture the size, position and orientation of objects in the world, thus playing a role in a variety of application scenarios including robotics, auto-driving, robotic vision, image retrieval and augmented reality.
The 3D object detection is a detection technique for outputting information such as object semantic type, length, width, height, and rotation angle in a three-dimensional space using information such as an RGB image, an RGB-D depth image, and a laser point cloud. Although 2D object detection is relatively mature and has been widely used in the industry, 3D object detection from 2D images remains a challenging problem due to the lack of data and diversity of the appearance and shape of objects in semantic categories.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a graph convolution network system and a 3D object detection method based on the graph convolution network system, which are used for processing a 3D point cloud through graph convolution of a multi-scale attention mechanism and improving the precision of target detection of an image in a three-dimensional space.
In a first aspect, an embodiment of the present invention provides a graph volume network system, including:
the shape semantic extraction module is used for receiving point cloud characteristics of the image, modeling the geometric position of the midpoint of the point cloud characteristics and obtaining global semantic characteristics;
the multilayer perceptron is connected with the shape semantic extraction module and is used for extracting multilevel semantic features by utilizing a multilayer graph convolution neural network based on the global semantic features and filtering the multilevel semantic features by using an attention mechanism;
the proposal generator is connected with the multilayer perceptron and used for summarizing the filtered multilevel semantic features and generating at least one primary proposal in a weighting way;
a proposal inference module, coupled to the proposal generator, for predicting a 3D bounding box and semantic categories of objects in the image using global semantic features and the at least one primary proposal.
In one embodiment, the shape semantic extraction module comprises:
a Fast Search Clustering (CFDP) module, configured to receive the point cloud features, And perform Clustering on feature points in the point cloud features By using a CFDP algorithm to obtain a plurality Of Clustering centers;
the k neighbor module is connected with the CFDP module and used for constructing a plurality of local areas related to the geometric positions of the points by using k neighbor relations according to the plurality of clustering centers;
and the attention aggregation module is used for adaptively aggregating the point characteristics of the clustering center and other points in the local area corresponding to the clustering center to obtain the global semantic characteristics.
In one embodiment, the attention aggregation module is to:
adaptively aggregating the other points in the clustering center and the local area corresponding to the clustering center to generate relative position information;
and constructing an aggregation method using an attention mechanism according to all points in the local area corresponding to the clustering center:
Figure 392950DEST_PATH_IMAGE001
(1)
wherein,
Figure 739618DEST_PATH_IMAGE002
representing the global semantic features of the image,
Figure 152145DEST_PATH_IMAGE003
a modeling function of the relative geometric position is represented,
Figure 4912DEST_PATH_IMAGE004
a point feature processing function is represented by,
Figure 186494DEST_PATH_IMAGE005
a point feature represented as the center of the cluster,
Figure 376167DEST_PATH_IMAGE006
representing point features in a local region corresponding to the cluster center,
Figure 338307DEST_PATH_IMAGE007
and
Figure 904418DEST_PATH_IMAGE008
respectively represent
Figure 206086DEST_PATH_IMAGE005
And
Figure 379709DEST_PATH_IMAGE006
the location information of (1).
In one embodiment, the multilayer perceptron comprises: the system comprises a multilayer graph convolution neural network and a plurality of self-adaptive aggregation modules, wherein the first layer of graph convolution neural network is connected with the shape semantic extraction module, and a self-adaptive module is connected between every two layers of graph convolution neural networks;
the multilayer graph convolution neural network is used for extracting the multilevel semantic features;
the self-adaptive aggregation module is used for filtering the semantic features output by the previous layer of the graph convolution neural network by using an attention mechanism, and inputting the filtered semantic features into the next layer of the graph convolution neural network.
In one embodiment, the attention mechanism is a polymerization process represented by formula (1);
the adaptive aggregation module is to: for a central point of convergence
Figure 766828DEST_PATH_IMAGE005
Polymerization of
Figure 136630DEST_PATH_IMAGE005
Other points in the corresponding local area
Figure 89542DEST_PATH_IMAGE006
To update
Figure 886597DEST_PATH_IMAGE005
The characteristics of (1).
In an embodiment, the proposal generator is connected to each layer in the multi-layer graph convolution neural network, the proposal generator being configured to:
converting the filtered multi-level semantic features into the same feature space using a voting module, wherein the voting module uses a function of:
Figure 495433DEST_PATH_IMAGE009
(2)
wherein,
Figure 481975DEST_PATH_IMAGE010
aggregation method using adaptation for representing design of multi-layer perceptron
Figure 492656DEST_PATH_IMAGE011
Representing semantic features and relative positions before adaptive aggregation,
Figure 460612DEST_PATH_IMAGE012
is shown to pass through
Figure 556744DEST_PATH_IMAGE013
The resulting offset of the semantic features and the offset of the relative positions,
Figure 330665DEST_PATH_IMAGE014
representing the semantic features and relative positions after adaptive aggregation;
using the VoteNet method, the
Figure 195853DEST_PATH_IMAGE014
Generating the at least oneThe first proposal.
In one embodiment, the proposal inference module is for:
using formulas
Figure 334710DEST_PATH_IMAGE015
Integrating all local information, wherein P represents the relative position of all local information, F represents the at least one primary proposal,
Figure 731187DEST_PATH_IMAGE016
representing the integrated information; the integrating operation comprises the following steps: integrating the characteristic information along the vertex direction and the channel direction, and considering the integration of relative positions among proposals and Hadamard inner product operation;
using the VoteNet method by
Figure 980903DEST_PATH_IMAGE016
Predicting the 3D bounding box and semantic category.
In a second aspect, an embodiment of the present invention further provides a 3D object detection method based on a graph-convolution network system. The method comprises the following steps:
s10: acquiring a training data set, wherein the training data set comprises a plurality of training data, and each training data is a point cloud feature of an image; performing 3D bounding box labeling and semantic category labeling on each training data;
s20: constructing any graph convolution network system provided by the embodiment of the invention;
s30: training the graph convolution network system by using the training data set;
s40: the method comprises the steps of collecting point cloud characteristics of an image to be predicted, inputting the point cloud characteristics of the image to be predicted into a trained graph convolution network system, and obtaining a 3D boundary box and a semantic category of an object in the image to be predicted.
In one embodiment, in step S30, the objective optimization function used is:
Figure 966177DEST_PATH_IMAGE017
wherein,
Figure 338252DEST_PATH_IMAGE018
representing the difference between the votes and the truth values obtained during the training process,
Figure 408976DEST_PATH_IMAGE019
for calculating whether the aggregated voting results relate to an object,
Figure 462383DEST_PATH_IMAGE020
representing the difference between the predicted 3D bounding box and the annotated 3D bounding box,
Figure 115212DEST_PATH_IMAGE021
represents the cross-entropy loss between the predicted and labeled classes,
Figure 595872DEST_PATH_IMAGE022
Figure 888313DEST_PATH_IMAGE023
and
Figure 807728DEST_PATH_IMAGE024
is a hyper-parameter.
In an embodiment, the method further comprises:
s50: and evaluating the performance of the 3D object detection method by using the average precision mean value, and evaluating the adaptability of the 3D object detection method for detecting various 3D objects by using the variation coefficient of the average precision.
In a third aspect, an embodiment of the present invention further provides a computer device. The device comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the program, the 3D object detection method based on the graph convolution network system provided by the embodiment of the invention is realized.
In a fourth aspect, the embodiment of the present invention further provides a storage medium, on which a computer-readable program is stored, where the program, when executed, implements any 3D object detection method based on a graph volume network system provided by the embodiment of the present invention.
The invention has the beneficial effects that: a fast search clustering algorithm is used to obtain a better clustering effect, and attention aggregation is introduced, so that the graph convolution neural network has better input characteristics; in a graph convolution neural network introducing a multilayer perceptron, multilayer geometric features with higher abstraction degree are obtained by using self-adaptive aggregation; and (3) fully utilizing the multilevel semantics, introducing global semantic information, and predicting a 3D boundary box and semantic categories. The operations effectively gain the final performance of the whole graph convolution network system, end-to-end 3D object detection based on a multi-scale attention mechanism is realized, the geometric corresponding relation between the shape semantics and the 3D point cloud characteristics is fully utilized, the precision of the 3D object detection is improved, and the interpretability of the depth network is stronger.
Drawings
Fig. 1 is a schematic structural diagram of a graph convolution network system according to an embodiment of the present invention.
Fig. 2 is a flowchart of a work flow of an adaptive aggregation module according to an embodiment of the present invention.
Fig. 3 is a flowchart of a 3D object detection method based on a graph convolution network system according to an embodiment of the present invention.
Fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and examples. The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should be understood that the terms "comprises" and "comprising", and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
With the rapid development of 3D acquisition technology, 3D sensors are becoming more and more popular and inexpensive, including various types of 3D scanners, lidar (lidar) and RGB-D cameras (e.g., Kinect, RealSense and Apple depth camera). The 3D data acquired by these sensors can provide rich geometric, shape, and dimensional information. The 3D data can typically be represented in different formats, including depth images, point clouds, meshes, and volumetric meshes. The point cloud representation is a common representation form and is characterized in that original geometric information is reserved in a three-dimensional space and discretization is not performed. Point clouds are therefore preferred representations for many scene understanding-related applications, such as autopilot and robotics. Deep learning techniques dominate many areas of research, such as computer vision, speech recognition, and natural language processing. However, deep learning of 3D point clouds still faces significant challenges of small data set size, high dimensionality and unstructured of the 3D point clouds. On the basis, the deep learning method for processing the 3D point cloud is mainly analyzed.
As the application field of graph convolution networks continues to expand, researchers have begun exploring how graph convolution neural networks directly model points in 3D point clouds. On the one hand, the ability to model local structures is crucial to the success of 3D object detection architectures, but the local shape information is not well interpreted. Because of the wide variety of objects that can be studied, the distribution of features required to detect different objects is not necessarily the same. In other words, multiple levels of semantics may be required to identify different objects. On the other hand, when the model extracts edge features between the cluster center and its neighboring points, local geometric relationships between these points are obtained, and shape information capture needs to be performed on the geometric structures between points in the local region deeply.
Example one
The embodiment provides a graph convolution network system for 3D object detection. The system comprises: the system comprises a shape semantic extraction module, a multilayer perceptron, a proposal generator and a proposal reasoning module.
The shape semantic extraction module is used for receiving point cloud characteristics of the image, modeling the geometric position of the midpoint of the point cloud characteristics and obtaining global semantic characteristics.
The multilayer perceptron is connected with the shape semantic extraction module and is used for extracting multilevel semantic features by utilizing a multilayer graph convolution neural network based on the global semantic features and filtering the multilevel semantic features by using an attention mechanism.
And the proposal generator is connected with the multilayer perceptron and used for summarizing the filtered multilevel semantic features and generating at least one primary proposal in a weighting manner.
A proposal inference module is coupled to the proposal generator for predicting a 3D bounding box and semantic categories of objects in the image using global semantic features and the at least one primary proposal.
In one embodiment, the shape semantic extraction module comprises: a CFDP module, a k-nearest neighbor module, and an attention aggregation module.
And the CFDP module is used for receiving the point cloud characteristics, clustering the characteristic points in the point cloud characteristics by using a CFDP algorithm and obtaining a plurality of clustering centers.
The k-nearest neighbor module is connected with the CFDP module and used for constructing a plurality of local areas related to the geometric positions of the points by using k-nearest neighbor relations according to the plurality of clustering centers.
And the attention aggregation module is used for adaptively aggregating point features of the clustering center and other points in the local area corresponding to the clustering center to obtain the global semantic features.
In one embodiment, the attention aggregation module is to: adaptively aggregating the other points in the clustering center and the local area corresponding to the clustering center to generate relative position information; and constructing an aggregation method using an attention mechanism according to all points in the local area corresponding to the clustering center:
Figure 502014DEST_PATH_IMAGE025
(1)
wherein,
Figure 153576DEST_PATH_IMAGE026
representing the global semantic features of the image,
Figure 11941DEST_PATH_IMAGE027
a modeling function of the relative geometric position is represented,
Figure 407151DEST_PATH_IMAGE028
a point feature processing function is represented by,
Figure 955944DEST_PATH_IMAGE029
a point feature represented as the center of the cluster,
Figure 106302DEST_PATH_IMAGE030
representing point features in a local region corresponding to the cluster center,
Figure 373335DEST_PATH_IMAGE031
and
Figure 572236DEST_PATH_IMAGE032
respectively represent
Figure 788584DEST_PATH_IMAGE029
And
Figure 47527DEST_PATH_IMAGE030
the location information of (1).
In one embodiment, the multilayer perceptron comprises: the system comprises the multilayer graph convolution neural network and a plurality of self-adaptive aggregation modules, wherein the first layer of graph convolution neural network is connected with the shape semantic extraction module, and a self-adaptive module is connected between every two layers of graph convolution neural networks.
The multilayer graph convolution neural network is used for extracting the multilevel semantic features.
The self-adaptive aggregation module is used for filtering the semantic features output by the previous layer of the graph convolution neural network by using an attention mechanism, and inputting the filtered semantic features into the next layer of the graph convolution neural network.
In one embodiment, the attention mechanism is a polymerization process represented by equation (1).
The adaptive aggregation module is to: for a central point of convergence
Figure 801857DEST_PATH_IMAGE029
Polymerization of
Figure 601186DEST_PATH_IMAGE029
Other points in the corresponding local area
Figure 124571DEST_PATH_IMAGE030
To update
Figure 554415DEST_PATH_IMAGE029
The characteristics of (1).
In an embodiment, the proposal generator is connected to each layer in the multilayer atlas neural network.
The proposal generator is for: converting the filtered multi-level semantic features into the same feature space using a voting module, wherein the voting module uses a function of:
Figure 343511DEST_PATH_IMAGE033
(2)
wherein,
Figure 884213DEST_PATH_IMAGE034
represents an aggregation method using adaptation designed for the multi-layer perceptron,
Figure 527684DEST_PATH_IMAGE035
representing semantic features and relative positions before adaptive aggregation,
Figure 925168DEST_PATH_IMAGE036
is shown to pass through
Figure 654089DEST_PATH_IMAGE034
The resulting offset of the semantic features and the offset of the relative positions,
Figure 545953DEST_PATH_IMAGE037
representing the semantic features and relative positions after adaptive aggregation;
using VoteNet method to fuse the multi-level semantic features
Figure 43930DEST_PATH_IMAGE037
Generating the at least one primary proposal.
In one embodiment, the proposal inference module is for: using formulas
Figure 549998DEST_PATH_IMAGE038
Integrating all local information, wherein P represents the relative position of all local information, F represents the at least one primary proposal,
Figure 828533DEST_PATH_IMAGE039
representing the integrated information; the integrating operation comprises the following steps: integrating the characteristic information along the vertex direction and the channel direction, and considering the integration of relative positions among proposals and Hadamard inner product operation; using the VoteNet method by
Figure 711038DEST_PATH_IMAGE039
Predicting the 3D bounding box and semantic category.
Fig. 1 is a schematic structural diagram of a graph volume network system according to an embodiment of the present invention, in which main structures in the graph volume network are shown, and directions of arrows indicate directions of main signal flows in the graph volume network system. The specific structure and operation principle of the graph convolution network will be described in detail with reference to fig. 1.
After the image point cloud characteristics are input into the graph convolution network system, the following processing flows are sequentially carried out.
Shape semantic extraction: modeling the geometric position of the point cloud midpoint and highlighting the importance of the shape information in the 3D object detection, and the specific process is as follows:
1. and clustering the characteristic points in the point cloud by using a CFDP algorithm. And inputting the sample points and the minimum distance between the two clustering categories, carrying out sample normalization, and sequencing the sample points according to the density of the sample points to generate a density graph of the sample points. And finding outliers in the density map, wherein the outliers are the central points (called clustering centers) of the clustering classes. And judging the cluster type of each sample point according to the density from large to small, and then solving the maximum edge density value of each cluster type. And finally, judging the noise point according to the maximum edge density value, and outputting a clustering class label.
2. A local region of geometric position about the point is constructed using k-nearest neighbor relations. According to each clustering center obtained by clustering, the clustering center closest to the current clustering center is obtainedkThe point is regarded as a local area corresponding to one cluster center. And sequentially carrying out the same operation on each clustering center to obtain a local area corresponding to each clustering center.
3. The features are aggregated by an attention aggregation module. Specifically, the point features are adaptively aggregated with the other points of the cluster center and the local region corresponding to the cluster center, and relative position information is generated. And constructing an aggregation method using an attention mechanism according to all points in the local area corresponding to the clustering center. The function used for the polymerization process is:
Figure 63522DEST_PATH_IMAGE040
(1)
wherein,
Figure 819120DEST_PATH_IMAGE041
which represents the characteristics after the polymerization of the polymer,
Figure 257054DEST_PATH_IMAGE042
a modeling function of the relative geometric position is represented,
Figure 677671DEST_PATH_IMAGE043
representing point feature processing functions, parameters thereof
Figure 150241DEST_PATH_IMAGE044
A point feature representing the center of the cluster,
Figure 60428DEST_PATH_IMAGE045
representing the characteristics of a certain point in the local area corresponding to the cluster center,
Figure 985659DEST_PATH_IMAGE046
and
Figure 209967DEST_PATH_IMAGE047
respectively represent
Figure 350092DEST_PATH_IMAGE044
And
Figure 634443DEST_PATH_IMAGE045
the location information of (1). (II) the multilayer perceptron based on the neural network utilizes shape semantic extraction and generates multilevel semantics in a layered mode, and uses an adaptive aggregation module after graph convolution operation, and the specific process is as follows:
and aggregating the characteristics of each sampling point and updating by using a Graph Convolutional neural Network (GCN) and a designed flow (I) method. By applying GCN to each point and local neighborhoods thereof to obtain a local geometric structure, stacking multilayer images with gradually enlarged neighborhoods to obtain features, the method can gradually enlarge the reception field of convolution and abstract the gradually enlarged local regions, thereby extracting the features in a layered manner and retaining the geometric structure of the points along the hierarchy.
Fig. 2 is a schematic diagram of an operation of an adaptive aggregation module according to an embodiment of the present invention. As shown in fig. 2, the work flow of the adaptive aggregation module is as follows.
For a central point of convergence
Figure 46970DEST_PATH_IMAGE044
And aggregating the features of other points in their respective local regions to update the feature of the point. In FIG. 2, p and q are used to represent
Figure 871706DEST_PATH_IMAGE044
Figure 53289DEST_PATH_IMAGE045
Figure 508541DEST_PATH_IMAGE048
(vector of points p to q) represents the geometry between these two points.
Figure 221413DEST_PATH_IMAGE048
Can be decomposed into three orthogonal basis vectors. Based on this vector decomposition, the edge features between the two points can be projected as three fixed orthogonal basis vectors, and the direction-dependent weighting matrix applied to extract the features along each direction, which are then weighted in proportion to the angle between the basis vectors. The vector decomposition method can reduce the variance of the absolute coordinates of the point cloud, enable the model to independently learn edge features along each basic direction, and aggregate according to the geometric relationship between the edge vectors and the basic vectors, so that the model can model the geometric structure between the points. Finally, the aggregated features for the point are generated with a weighting via a learnable self-attention module.
Through the self-adaptive aggregation module, the function of filtering the multi-level semantic features extracted by the multi-layer graph convolution neural network by using an attention mechanism is realized. It should be noted that "filtering" is a weighted summation process, and may include the following steps: for other points in the local area of the aggregation center, distributing different weights to the point features of each point according to the correlation between each point and the aggregation center, wherein the distributed weights are increased along with the increase of the correlation; according to the distributed weight, carrying out weighted summation on the point characteristics of other points in the local area of the aggregation center; and updating the point feature of the aggregation center by using the weighted and summed point feature, namely using the weighted and averaged point feature as the point feature of the aggregation center. For example, for a point with a greater correlation to the aggregate center point, a greater weight may be assigned; for points with small correlation with the aggregation center, a small weight value can be allocated, even a weight value of 0 is allocated; and then the weighted summation of the point features is used to realize the effect of 'filtering' the point features with small or even irrelevant relevance.
Thirdly, summarizing multilayer semantics through a proposal generator to generate a primary proposal, and the specific process is as follows:
1. and (5) converting the filtered multilevel semantics obtained by the flow (II) into the same feature space by using a voting module. The function used by the voting module is:
Figure 787524DEST_PATH_IMAGE049
(2) wherein
Figure 823613DEST_PATH_IMAGE050
shows the self-adaptive polymerization method designed and used in the flow (II),
Figure 512083DEST_PATH_IMAGE051
representing semantic features and relative positions before adaptive aggregation,
Figure 899202DEST_PATH_IMAGE052
is shown to pass through
Figure 3424DEST_PATH_IMAGE050
The resulting offset of the semantic features and the offset of the relative positions,
Figure 972649DEST_PATH_IMAGE053
representing the semantic features and relative positions after adaptive aggregation. And converting the multi-layer semantic information with different sizes into the semantic information with the same size by executing the improved voting module.
2. Using VoteNet method, fused multilevel semantics
Figure 769703DEST_PATH_IMAGE053
And generating a proposal. The voting result is retained through a Farth Point Sampling (FPS) technology, and multilevel semantic information is fused by adopting a VoteNet method (the number of points in the method is set as 256 by default) to predict a bounding box and a category, which is called as a primary proposal.
And (IV) predicting the 3D bounding box and the semantic category by using the global semantics and the primary proposal by using a proposal reasoning module, wherein the specific process is as follows:
and (3) combining the global semantic information generated by the flow (I) and the primary proposal generated by the flow (III) by using a VoteNet method to finally generate a 3D boundary box and a semantic category, which are called as a final proposal. The "global semantic information" herein refers to an aggregate feature having global semantics, and may also be referred to as a "global semantic feature".
First, using a formula
Figure 378539DEST_PATH_IMAGE054
Integrating all local information, wherein P represents the relative position of all local information, F represents all primary proposals,
Figure 552032DEST_PATH_IMAGE055
representing the integrated information. The integration operation includes integration of feature information in the vertex direction and the channel direction, integration considering relative positions between proposals, and a Hadamard inner product operation. Finally, a VoteNet method is used for predicting the 3D bounding box and the semantic category through the information after integration.
And (3) combining the global semantic information generated by the process (I) and the primary proposal generated by the process (III) by using a VoteNet method to finally generate a 3D bounding box and a semantic category.
In the embodiment of the application, each module can achieve the following beneficial effects:
(1) shape semantic extraction module
A. And (2) clustering the points in the point cloud by using a CFDP algorithm, wherein a density threshold value must be set by a DBscan algorithm because the classical k-means clustering algorithm cannot detect the data distribution of the aspheric surface category, the CFDP algorithm selects the maximum value of the density of each region through the improvement of the two classical methods, and the clustering category is selected according to the density.
B. The central point of the local area and other points of the local area are used, the characteristics of the aggregation point are controlled by the attention machine, and the method is different from the prior method that the 3D object detection uses the maximum pooling operation and only uses single information; the invention makes full use of all information by attention aggregation, and increases the accuracy of the prediction result while ensuring the information quantity.
(2) Multilayer perceptron based on neural network
A. The invention does not adopt a U-shaped network structure of sampling first and then sampling, and only uses a hierarchical graph convolution neural network to generate multilevel semantics. The method ensures the calculation speed and does not influence the accuracy rate due to the introduction of noise generated in the up-sampling process.
B. After the point cloud is subjected to graph convolution neural network, self-adaptive aggregation is used, the vector decomposition method is used for reducing the absolute coordinate variance of the point cloud, the model learns edge features along the basic direction, aggregation is carried out according to the geometric relation between the edge and the basic vector, and the model can be used for modeling the geometric structure between the points. And self-adaptive aggregation is used for acquiring the geometric information of the point cloud as much as possible, the information amount of the central point is increased, and the geometric characteristics with higher abstraction degree are obtained.
(3) A proposal generator:
different from the prior method (such as VoteNet), which only applies one feature map to predict objects, because the invention generates multi-level semantics through the multi-layer perceptron, and converts the multi-level semantics into the same feature space through the voting module, the voting module fully utilizes the characteristic of large information quantity reserved by the multi-level semantics, and the accuracy rate of the result can be obviously improved.
(4) Proposal reasoning module
By the structure in (1), (2) and (3), the local semantics of the multilevel structure are captured and fused, but the global semantics are not used in object detection, so the inference module of the invention merges the global semantics through a new graph convolution neural network and operates on the primary proposal, and generates a prediction bounding box and semantic classes for output finally. The module can combine the local semantics with the global semantics to generate a more accurate boundary box.
In summary, in the graph convolution network system provided in the embodiment of the present invention, a fast search clustering algorithm is used to obtain a better clustering effect, and attention aggregation is introduced, so that the graph convolution neural network has a better input characteristic; in a graph convolution neural network introducing a multilayer perceptron, multilayer geometric features with higher abstraction degree are obtained by using self-adaptive aggregation; and (3) fully utilizing the multilevel semantics, introducing global semantic information, and predicting a 3D boundary box and semantic categories. These operations not only produce good results in the respective modules, but also effectively gain the final performance of the whole graph convolution network system. In addition, the invention also realizes end-to-end 3D object detection based on a multi-scale attention mechanism, and compared with the 3D object detection method which does not pay attention to shape semantics in the existing 3D object detection technology, the graph convolution network system of the invention fully utilizes the geometric corresponding relation between the shape semantics and the 3D point cloud characteristics, thereby not only improving the precision of 3D object detection, but also ensuring that the interpretability of a depth network is stronger.
It should be noted that, in the foregoing embodiment, each included unit and each included module are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
Example two
The embodiment provides a 3D object detection method based on a graph convolution network system. The method is based on the graph convolution network system described in embodiment 1. Fig. 3 is a flowchart of a 3D object detection method based on a graph convolution network system according to an embodiment of the present invention. As shown in FIG. 3, the method includes steps S10-S40.
S10: acquiring a training data set, wherein the training data set comprises a plurality of training data, and each training data is a point cloud feature of an image; and performing 3D bounding box labeling and semantic category labeling on each training data.
S20: constructing the convolution network system in any one of the first embodiment.
S30: and training the graph convolution network system by using the training data set.
S40: the method comprises the steps of collecting point cloud characteristics of an image to be predicted, inputting the point cloud characteristics of the image to be predicted into a trained graph convolution network system, and obtaining a 3D boundary box and a semantic category of an object in the image to be predicted.
In one embodiment, in step S30, the objective optimization function used is:
Figure 625030DEST_PATH_IMAGE056
wherein,
Figure 592986DEST_PATH_IMAGE057
representing the difference between the votes and the truth values obtained during the training process,
Figure 689118DEST_PATH_IMAGE058
for calculating whether the aggregated voting results relate to an object,
Figure 948192DEST_PATH_IMAGE059
representing the difference between the predicted 3D bounding box and the annotated 3D bounding box,
Figure 78959DEST_PATH_IMAGE060
represents the cross-entropy loss between the predicted and labeled classes,
Figure 280133DEST_PATH_IMAGE061
Figure 863561DEST_PATH_IMAGE062
and
Figure 113277DEST_PATH_IMAGE063
is a hyper-parameter.
In an embodiment, the method further comprises:
s50: and evaluating the performance of the 3D object detection method by using the average precision mean value, and evaluating the adaptability of the 3D object detection method for detecting various 3D objects by using the variation coefficient of the average precision.
In the embodiment of the present invention, steps S40 and S50 represent a specific process for performing 3D object detection by using the graph volume network system, and may include the following steps:
(1) image point cloud feature collection
And in the point cloud feature extraction stage, acquiring by using corresponding acquisition equipment according to actual application requirements.
(2) Shape semantic extraction module
In the shape semantic extraction stage, the quick search clustering, the k nearest neighbor method and the attention aggregation are used for obtaining the aggregation characteristics. For more technical details, see flow (i) in example i.
(3) Neural network point cloud feature extraction
In the extraction stage of the point cloud characteristics of the neural network, according to the actual application requirements, a multilayer graph convolution neural network and a self-adaptive aggregation module can be used for preserving the geometric structure of the points along the hierarchy. For more details, see the flow (ii) in the first embodiment.
(4) Proposal generator
In the proposal generation stage, the summary multi-layer semantics generate a primary proposal. See flow (iii) in example one for more technical details.
(5) Proposal reasoning module
In the proposal inference phase, 3D bounding boxes and semantic categories are predicted using global semantics and primary proposals. See flow (iv) in example one for more technical details.
(6) Graph convolution network method based on multi-scale attention mechanism
In the stage of generating the bounding box (the stage of generating the whole graph convolution network system), an optimization objective function is established through object boundary information and object class information:
Figure 911600DEST_PATH_IMAGE064
wherein, the loss function contains 4 items in total, which are respectively:
Figure 221358DEST_PATH_IMAGE065
represents the loss of Vote (Vote loss): calculating the difference between the votes generated by the third procedure of the first embodiment and the true values (the votes regression loss including the distance L1);
Figure 292083DEST_PATH_IMAGE066
object loss (Object loss): calculating whether the summarized voting result is related to an object (calculating the distance between the center of the object of the 3D bounding box predicted in the flow (iv) in the first embodiment and the center of the real object, setting the distance to be less than the threshold value to 0, otherwise setting the distance to be 1);
Figure 407806DEST_PATH_IMAGE067
represents bounding Box loss (Box loss): calculating the difference (regression loss) between the predicted 3D bounding box in the step (4) and the actual 3D bounding box;
Figure 247586DEST_PATH_IMAGE068
represents semantic Classification loss (semantic Classification loss): cross entropy loss of classes is calculated (whether the predicted semantic classification in flow (four) in example one is a loss that is not a true value of the semantic classification).
The condition for training termination is generally set as the number of iterations, and the number of common iterations may be set to 50, 100, etc.
It should be noted that only the object 3D bounding box and semantic category need to be labeled in the training dataset. Centre distance of real object (for calculating
Figure 728246DEST_PATH_IMAGE069
) Can be calculated from the 3D bounding box of the object. Actual 3D bounding boxes of objects (for calculation
Figure 833736DEST_PATH_IMAGE070
) And semantic classification of objects (for computing)
Figure 690834DEST_PATH_IMAGE071
) Are all the existing labeling information in the training data set. The votes are obtained by network learning, and the truth values are obtained by 3D point cloud calculation: given an input point cloud containing N points and XYZ coordinates, the points are sampled and the depth features are learned, and a subset of M points is output. The subset of points is considered to be seed points (seed points), which are fixed, and each seed independently generates a vote, which is called a true value for ease of understanding since the votes are also fixed.
Figure 385121DEST_PATH_IMAGE072
The votes in (1) are votes obtained by network learning, and include 3D coordinates and high-dimensional feature vectors as the true values.
Figure 98999DEST_PATH_IMAGE072
Is the difference between the votes learned by the computational network and a fixed true value, both of which contain 3D coordinates and high-dimensional eigenvectors.
Figure 144315DEST_PATH_IMAGE073
Figure 539524DEST_PATH_IMAGE074
And
Figure 889648DEST_PATH_IMAGE075
is a hyper-parameter. In the present embodiment, it is preferred that,
Figure 977690DEST_PATH_IMAGE073
the setting is made to be 0.5,
Figure 244723DEST_PATH_IMAGE074
the setting is 1, and the setting is,
Figure 240361DEST_PATH_IMAGE075
set to 0.1. The graph convolution network system can be trained on a GeForce RTX 2080Ti GPU, random gradient descent is adopted in the training process to realize optimization, and the initial learning rate is set to be
Figure 909239DEST_PATH_IMAGE076
Using a batch process of size 8, iteration 120 rounds of weight decay are used. This embodiment may also be written in the pytorech language.
(7) The evaluation index of the method can select a mean Average Precision (mAP) of general indexes of target detection work for evaluating the performance of the compared frames. The adaptability of the framework to detect various 3D objects can also be demonstrated using a coefficient of variation (cvAP) of Average Precision (AP).
Figure 168182DEST_PATH_IMAGE077
Wherein,
Figure 469982DEST_PATH_IMAGE078
representing 3D objectsNumber of semantic categories. The lower the cvAP, the better the performance of the framework.
In summary, in the 3D object detection method based on the convolutional neural network system provided in the embodiment of the present invention, a fast search clustering algorithm is used to obtain a better clustering effect, and attention aggregation is introduced, so that the convolutional neural network has better input characteristics; in a graph convolution neural network introducing a multilayer perceptron, multilayer geometric features with higher abstraction degree are obtained by using self-adaptive aggregation; and (3) fully utilizing the multilevel semantics, introducing global semantic information, and predicting a 3D boundary box and semantic categories. These operations together effectively gain the detection performance of the entire 3D object detection method. In addition, the invention also realizes the end-to-end 3D object detection based on the multi-scale attention mechanism, and compared with the 3D object detection method which does not pay attention to the shape semantics in the existing 3D object detection technology, the method of the invention fully utilizes the geometric corresponding relation between the shape semantics and the 3D point cloud characteristics, thereby not only improving the precision of the 3D object detection, but also ensuring that the interpretability of the depth network is stronger.
The 3D object detection method of the embodiment of the present invention has the same technical principle and beneficial effects as the graph convolution network system of the first embodiment. Please refer to the graph volume network system in the first embodiment without detailed technical details in this embodiment.
EXAMPLE III
Fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present invention. As shown in fig. 4, the apparatus includes a processor 410 and a memory 420. The number of the processors 410 may be one or more, and one processor 410 is taken as an example in fig. 4.
The memory 420 serves as a computer-readable storage medium, and may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules of the 3D object detection method based on the graph volume network system in the embodiment of the present invention. The processor 410 implements the above-described 3D object detection method based on the graph convolution network system by executing software programs, instructions, and modules stored in the memory 420.
The memory 420 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 420 may further include memory located remotely from the processor 410, which may be connected to the device/terminal/server via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Example four
The embodiment of the invention also provides a storage medium. Alternatively, in this embodiment, the storage medium may be configured to store a computer program for executing the 3D object detection method based on the graph volume network system provided in any embodiment of the present invention.
Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A graph convolution network system for 3D object detection, comprising:
the shape semantic extraction module is used for receiving point cloud characteristics of the image, modeling the geometric position of the midpoint of the point cloud characteristics and obtaining global semantic characteristics;
the multilayer perceptron is connected with the shape semantic extraction module and is used for extracting multilevel semantic features by utilizing a multilayer graph convolution neural network based on the global semantic features and filtering the multilevel semantic features by using an attention mechanism;
the proposal generator is connected with the multilayer perceptron and used for summarizing the filtered multilevel semantic features and generating at least one primary proposal in a weighting way;
a proposal inference module, coupled to the proposal generator, for predicting a 3D bounding box and semantic categories of objects in the image using the global semantic features and the at least one primary proposal;
wherein the shape semantic extraction module comprises:
the fast searching and clustering CFDP module is used for receiving the point cloud characteristics and clustering the characteristic points in the point cloud characteristics by using a CFDP algorithm to obtain a plurality of clustering centers;
the k neighbor module is connected with the CFDP module and used for constructing a plurality of local areas related to the geometric positions of the points by using k neighbor relations according to the plurality of clustering centers;
the attention aggregation module is used for adaptively aggregating point features of the clustering center and other points in the local area corresponding to the clustering center to obtain the global semantic features;
the attention aggregation module is for:
adaptively aggregating the other points in the clustering center and the local area corresponding to the clustering center to generate relative position information;
and constructing an aggregation method using an attention mechanism according to all points in the local area corresponding to the clustering center:
Figure DEST_PATH_IMAGE001
(1)
wherein,
Figure 284887DEST_PATH_IMAGE002
representing the global semantic features of the image,
Figure DEST_PATH_IMAGE003
a modeling function of the relative geometric position is represented,
Figure 211254DEST_PATH_IMAGE004
a point feature processing function is represented by,
Figure DEST_PATH_IMAGE005
a point feature represented as the center of the cluster,
Figure 25627DEST_PATH_IMAGE006
representing point features in a local region corresponding to the cluster center,
Figure DEST_PATH_IMAGE007
and
Figure 254614DEST_PATH_IMAGE008
respectively represent
Figure 521647DEST_PATH_IMAGE005
And
Figure DEST_PATH_IMAGE009
the location information of (1).
2. The graph convolution network system of claim 1,
the multilayer perceptron includes: the system comprises a multilayer graph convolution neural network and a plurality of self-adaptive aggregation modules, wherein the first layer of graph convolution neural network is connected with the shape semantic extraction module, and a self-adaptive module is connected between every two layers of graph convolution neural networks;
the multilayer graph convolution neural network is used for extracting the multilevel semantic features;
the self-adaptive aggregation module is used for filtering the semantic features output by the previous layer of the graph convolution neural network by using an attention mechanism, and inputting the filtered semantic features into the next layer of the graph convolution neural network.
3. The graph convolution network system of claim 2,
the attention mechanism is a polymerization method represented by formula (1);
the adaptive aggregation module is to: for a central point of convergence
Figure 251706DEST_PATH_IMAGE010
Polymerization of
Figure 920584DEST_PATH_IMAGE010
Other points in the corresponding local area
Figure DEST_PATH_IMAGE011
To update
Figure 913948DEST_PATH_IMAGE012
The characteristics of (1).
4. The system of claim 3, wherein the proposal generator is connected to each layer in the multi-layer atlas neural network, the proposal generator being configured to:
converting the filtered multi-level semantic features into the same feature space using a voting module, wherein the voting module uses a function of:
Figure DEST_PATH_IMAGE013
(2)
wherein,
Figure 74802DEST_PATH_IMAGE014
represents an adaptive aggregation method for which the multi-layer perceptron is designed,
Figure DEST_PATH_IMAGE015
representing semantic features and relative positions before adaptive aggregation,
Figure 811814DEST_PATH_IMAGE016
is shown to pass through
Figure 866358DEST_PATH_IMAGE017
The resulting offset of the semantic features and the offset of the relative positions,
Figure 296202DEST_PATH_IMAGE018
representing the semantic features and relative positions after adaptive aggregation;
using the VoteNet method, the
Figure DEST_PATH_IMAGE019
Generating the at least one primary proposal.
5. The graph convolution network system of claim 4, wherein the proposal inference module is to:
using formulas
Figure 537828DEST_PATH_IMAGE020
Integrating all local information, wherein P represents the relative position of all local information, F represents the at least one primary proposal,
Figure 219476DEST_PATH_IMAGE021
representing the integrated information; the integrating operation comprises the following steps: integrating the characteristic information along the vertex direction and the channel direction, and considering the integration of relative positions among proposals and Hadamard inner product operation;
using the VoteNet method by
Figure DEST_PATH_IMAGE022
Predicting the 3D bounding box and semantic category.
6. A3D object detection method based on a graph convolution network system is characterized by comprising the following steps:
s10: acquiring a training data set, wherein the training data set comprises a plurality of training data, and each training data is a point cloud feature of an image; performing 3D bounding box labeling and semantic category labeling on each training data;
s20: constructing a graph convolution network system according to any one of claims 1 to 5;
s30: training the graph convolution network system by using the training data set;
s40: the method comprises the steps of collecting point cloud characteristics of an image to be predicted, inputting the point cloud characteristics of the image to be predicted into a trained graph convolution network system, and obtaining a 3D boundary box and a semantic category of an object in the image to be predicted.
7. The 3D object detection method according to claim 6, wherein in step S30, the objective optimization function used is:
Figure 862947DEST_PATH_IMAGE023
wherein,
Figure DEST_PATH_IMAGE024
representing the difference between the votes and the truth values obtained during the training process,
Figure 729272DEST_PATH_IMAGE025
for calculating whether the aggregated voting results relate to an object,
Figure DEST_PATH_IMAGE026
representing the difference between the predicted 3D bounding box and the annotated 3D bounding box,
Figure 458193DEST_PATH_IMAGE027
represents the cross-entropy loss between the predicted and labeled classes,
Figure DEST_PATH_IMAGE028
Figure 537008DEST_PATH_IMAGE029
and
Figure 441510DEST_PATH_IMAGE030
is a hyper-parameter.
8. The 3D object detection method of claim 7, further comprising:
s50: and evaluating the performance of the 3D object detection method by using the average precision mean value, and evaluating the adaptability of the 3D object detection method for detecting various 3D objects by using the variation coefficient of the average precision.
CN202110369721.XA 2021-04-07 2021-04-07 Graph convolution network system and 3D object detection method based on graph convolution network system Active CN112801059B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110369721.XA CN112801059B (en) 2021-04-07 2021-04-07 Graph convolution network system and 3D object detection method based on graph convolution network system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110369721.XA CN112801059B (en) 2021-04-07 2021-04-07 Graph convolution network system and 3D object detection method based on graph convolution network system

Publications (2)

Publication Number Publication Date
CN112801059A CN112801059A (en) 2021-05-14
CN112801059B true CN112801059B (en) 2021-07-20

Family

ID=75816364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110369721.XA Active CN112801059B (en) 2021-04-07 2021-04-07 Graph convolution network system and 3D object detection method based on graph convolution network system

Country Status (1)

Country Link
CN (1) CN112801059B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449612B (en) * 2021-06-15 2022-06-07 燕山大学 Three-dimensional target point cloud identification method based on sub-flow sparse convolution
CN114359893A (en) * 2022-01-04 2022-04-15 京东鲲鹏(江苏)科技有限公司 Object detection method, device, electronic equipment and storage medium
CN114882494B (en) * 2022-03-09 2023-05-23 南京航空航天大学 Three-dimensional point cloud feature extraction method based on multi-modal attention driving
CN114743123A (en) * 2022-04-29 2022-07-12 电子科技大学 Scene understanding method based on implicit function three-dimensional representation and graph neural network
CN114998890B (en) * 2022-05-27 2023-03-10 长春大学 Three-dimensional point cloud target detection algorithm based on graph neural network
CN114814776B (en) * 2022-06-24 2022-10-14 中国空气动力研究与发展中心计算空气动力研究所 PD radar target detection method based on graph attention network and transfer learning

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10438371B2 (en) * 2017-09-22 2019-10-08 Zoox, Inc. Three-dimensional bounding box from two-dimensional image and point cloud data
CN109902293B (en) * 2019-01-30 2020-11-24 华南理工大学 Text classification method based on local and global mutual attention mechanism
CN111316286B (en) * 2019-03-27 2024-09-10 深圳市卓驭科技有限公司 Track prediction method and device, storage medium, driving system and vehicle
CN111832358A (en) * 2019-04-19 2020-10-27 北京京东叁佰陆拾度电子商务有限公司 Point cloud semantic analysis method and device
CN112101066B (en) * 2019-06-17 2024-03-08 商汤集团有限公司 Target detection method and device, intelligent driving method and device and storage medium
CN110245709B (en) * 2019-06-18 2021-09-03 西安电子科技大学 3D point cloud data semantic segmentation method based on deep learning and self-attention
CN110516697B (en) * 2019-07-15 2021-08-31 清华大学 Evidence graph aggregation and reasoning based statement verification method and system
CN110956044A (en) * 2019-12-02 2020-04-03 北明软件有限公司 Attention mechanism-based case input recognition and classification method for judicial scenes
CN111192270A (en) * 2020-01-03 2020-05-22 中山大学 Point cloud semantic segmentation method based on point global context reasoning
CN111539949B (en) * 2020-05-12 2022-05-13 河北工业大学 Point cloud data-based lithium battery pole piece surface defect detection method
CN111798475B (en) * 2020-05-29 2024-03-22 浙江工业大学 Indoor environment 3D semantic map construction method based on point cloud deep learning
CN111860138B (en) * 2020-06-09 2024-03-01 中南民族大学 Three-dimensional point cloud semantic segmentation method and system based on full fusion network
CN112488210A (en) * 2020-12-02 2021-03-12 北京工业大学 Three-dimensional point cloud automatic classification method based on graph convolution neural network

Also Published As

Publication number Publication date
CN112801059A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN112801059B (en) Graph convolution network system and 3D object detection method based on graph convolution network system
CN110175671B (en) Neural network construction method, image processing method and device
CN110309856A (en) Image classification method, the training method of neural network and device
CN112949647B (en) Three-dimensional scene description method and device, electronic equipment and storage medium
Malone et al. Data mining using rule extraction from Kohonen self-organising maps
US20160224903A1 (en) Hyper-parameter selection for deep convolutional networks
Jiang et al. Hyperspectral image classification with spatial consistence using fully convolutional spatial propagation network
CN111191699B (en) Multi-view clustering method based on non-negative matrix factorization and division adaptive fusion
CN110222718B (en) Image processing method and device
CN111046900A (en) Semi-supervised generation confrontation network image classification method based on local manifold regularization
CN111310821B (en) Multi-view feature fusion method, system, computer equipment and storage medium
CN112215332A (en) Searching method of neural network structure, image processing method and device
CN112819039A (en) Texture recognition model establishing method based on multi-scale integrated feature coding and application
Raparthi et al. Machine Learning Based Deep Cloud Model to Enhance Robustness and Noise Interference
Shin et al. Incremental deep learning for robust object detection in unknown cluttered environments
CN116781346A (en) Convolution two-way long-term and short-term memory network intrusion detection method based on data enhancement
CN114004383A (en) Training method of time series prediction model, time series prediction method and device
CN114510594A (en) Traditional pattern subgraph retrieval method based on self-attention mechanism
CN113705596A (en) Image recognition method and device, computer equipment and storage medium
CN117746260A (en) Remote sensing data intelligent analysis method and system
Korrapati et al. Multi-resolution map building and loop closure with omnidirectional images
CN113128285A (en) Method and device for processing video
CN111967365B (en) Image connection point extraction method and device
CN116994114A (en) Lightweight household small target detection model construction method based on improved YOLOv8
CN113343953B (en) FGR-AM method and system for remote sensing scene recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information

Inventor after: Jiang Zhifang

Inventor after: Yang Guangyuan

Inventor after: Huang Jin

Inventor after: Zhang Kai

Inventor after: Ding Dongrui

Inventor before: Yang Guangyuan

Inventor before: Huang Jin

Inventor before: Zhang Kai

Inventor before: Ding Dongrui

CB03 Change of inventor or designer information
TR01 Transfer of patent right

Effective date of registration: 20240207

Address after: Room 1609, 16th Floor, Building 2, Xinsheng Building, Northwest Corner of Xinluo Street and Yingxiu Road Intersection, Shunhua Road Street, Jinan Area, China (Shandong) Pilot Free Trade Zone, Jinan City, Shandong Province, 250014

Patentee after: Lingxin Huizhi (Shandong) Intelligent Technology Co.,Ltd.

Country or region after: China

Address before: Room 156-8, No.5 Lingbin Road, Dangan Town, Xiangzhou District, Zhuhai City, Guangdong Province 510000

Patentee before: Guangdong Zhongju Artificial Intelligence Technology Co.,Ltd.

Country or region before: China

TR01 Transfer of patent right