CN111462324B - Online spatiotemporal semantic fusion method and system - Google Patents

Online spatiotemporal semantic fusion method and system Download PDF

Info

Publication number
CN111462324B
CN111462324B CN202010418823.1A CN202010418823A CN111462324B CN 111462324 B CN111462324 B CN 111462324B CN 202010418823 A CN202010418823 A CN 202010418823A CN 111462324 B CN111462324 B CN 111462324B
Authority
CN
China
Prior art keywords
voxel
semantic
data
fusion
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010418823.1A
Other languages
Chinese (zh)
Other versions
CN111462324A (en
Inventor
于耀
骆润豪
周余
都思丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202010418823.1A priority Critical patent/CN111462324B/en
Publication of CN111462324A publication Critical patent/CN111462324A/en
Application granted granted Critical
Publication of CN111462324B publication Critical patent/CN111462324B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/05Geographic models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/08Indexing scheme for image data processing or generation, in general involving all processing steps from image acquisition to 3D model generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Abstract

The invention relates to an online space-time semantic fusion method and system. The method comprises the following steps: acquiring initial data of an object to be semantically fused and a 2D semantic segmentation network; determining information of each data point in single-frame point cloud data by using initial data as input and adopting a 2D semantic segmentation network; transforming the single-frame point cloud data into a three-dimensional world coordinate system by taking a voxel as a basic unit, and establishing a three-dimensional voxel grid map by using a dictionary data structure; generating a voxel data set according to voxels in the three-dimensional voxel grid map; acquiring an online spatiotemporal semantic fusion network; and determining a three-dimensional semantic fusion map of an object to be semantically fused by using an online space-time semantic fusion network by taking the image feature vector of the voxel in the voxel data set as input. The method and the system provided by the invention creatively use the networking method based on the attention mechanism to solve the problem that the prior art can only trace limited frame data and can not fully utilize historical information, and have the characteristics of high efficiency and complete semantic fusion.

Description

Online spatiotemporal semantic fusion method and system
Technical Field
The invention relates to the field of computer vision and deep learning, in particular to an online space-time semantic fusion method and system.
Background
In the development process of artificial intelligence technology, the development of intelligent robot technology is more and more emphasized, and in the robot technology, it is always very important to obtain accurate 3D structure and semantic information.
Currently, in the field of semantic segmentation, 2D semantic segmentation processes a single picture directly, which is cost effective but does not utilize temporal and spatial information. On the other hand, some methods directly use 3D spatial structure information for semantic segmentation, but some problems are often encountered in practical use. In the method related to the Pointnet, the input data usually needs to be well spliced point clouds, which brings about the problems of large calculation amount and incapability of real-time operation. In addition, some methods perform convolution directly in 3D space, but convolution in 3D space generally cannot achieve as good an effect as 2D convolution due to sparsity of 3D space.
The field of robotics often uses time-series 3D scans and image data, and when 3D structure and semantic information are obtained from these data, some methods project the semantic probability of pixels obtained by 2D methods onto three-dimensional space for bayesian probability fusion in order to overcome some of the drawbacks of the above methods. Although they can achieve good real-time performance, this method based on Hidden Markov Model (HMM) Model can usually trace only limited frame data, and cannot fully utilize historical information. Moreover, the fusion method using probabilities directly results in the loss of a large amount of information.
Disclosure of Invention
The invention aims to provide an online space-time semantic fusion method and system, which can solve the problems that only limited frame data can be traced and historical information cannot be fully utilized in the prior art, and have the characteristics of high efficiency and complete semantic fusion.
In order to achieve the purpose, the invention provides the following scheme:
an online spatiotemporal semantic fusion method, comprising:
acquiring initial data of an object to be semantically fused and a 2D semantic segmentation network; the initial data comprises point cloud data and an RGB picture;
determining the information of each data point in single-frame point cloud data by using the 2D semantic segmentation network by taking the initial data as input; the information for each of the data points includes: coordinates of the data points in a world coordinate system, image semantic feature vectors corresponding to the data points and semantic label values of the data points;
transforming the single-frame point cloud data into a three-dimensional world coordinate system by taking the voxel as a basic unit, and establishing a three-dimensional voxel grid map by using a dictionary data structure; each voxel in the three-dimensional voxel grid map comprises an image characteristic vector and a label corresponding to the data point;
generating a voxel data set from the voxels in the three-dimensional voxel grid map;
acquiring an online spatiotemporal semantic fusion network;
determining a three-dimensional semantic fusion map of the object to be semantically fused by using the online space-time semantic fusion network by taking the image feature vector of the voxel in the voxel data set as input; the three-dimensional semantic fusion map comprises voxel type probability.
Preferably, the transforming the single-frame point cloud data into a three-dimensional world coordinate system by using the voxel as a basic unit and establishing a three-dimensional voxel grid map by using a dictionary data structure specifically include:
taking a key of each element in the dictionary data structure as a center coordinate of a voxel, taking a value of each element as a list, and storing data of data points of historical frames falling in the voxel;
after new single-frame point cloud data are obtained, the center coordinates of voxels falling on each data point are used as keys to index the dictionary data structure, and an index result is obtained;
if the index result is that the center coordinate is indexed, adding data of the data point to the tail part of the list of the corresponding element;
if the index result is that the center coordinate cannot be indexed, creating a new element taking the voxel center coordinate as a key, and creating an empty list to add the data point into the dictionary data structure;
and returning to the step of indexing the dictionary data structure by taking the central coordinate of the voxel of each data point as a key after new single-frame point cloud data is obtained, and obtaining an index result until all frames of point cloud data are accumulated in the dictionary data structure, thereby obtaining the three-dimensional voxel grid map.
Preferably, the construction process of the online spatiotemporal semantic fusion network includes:
acquiring an observation self-adaptive semantic state updating network and a self-attention information fusion network; the input of the observation self-adaptive semantic state updating network is an image feature vector in the voxel data set, and the output is an observation self-adaptive semantic fusion state of each voxel in time; the input of the self-attention information fusion network is the observation self-adaptive semantic fusion state, and the output is a voxel type prediction vector;
and constructing the online spatiotemporal semantic fusion network according to the observation self-adaptive semantic state updating network and the self-attention information fusion network.
Preferably, the process for constructing the observation adaptive semantic state updating network includes:
acquiring a normal vector of a voxel in the voxel data set and image semantic features corresponding to historical frame point cloud data in the voxel data set, and taking a central coordinate of the voxel as a sensor pose;
a gating circulation unit is adopted, and the normal vector and the sensor pose are used as input to obtain an observation effective state; the observed validity state is:
Figure GDA0003483551070000031
in the formula (I), the compound is shown in the specification,
Figure GDA0003483551070000032
to observe the significance state, Vi tFor the sensor pose of the ith voxel at time t,
Figure GDA0003483551070000033
the normal vector of the ith voxel at the time t is taken, and GRU is a gating cycle;
according to the image semantic features and the observation validity state, determining an observation self-adaption semantic fusion state of each voxel on time by adopting the gating circulation unit; the observation self-adaptive semantic fusion state is as follows:
Figure GDA0003483551070000034
in the formula, Fi tAdaptive semantic fusion for observationsIn the closed state, GRU is a gating cycle, concatenate is a series operation,
Figure GDA0003483551070000035
in order to be a semantic feature of the image,
Figure GDA0003483551070000036
to observe the validity state.
Preferably, the process of constructing the self-attention information fusion network includes:
taking a cube taking a current voxel as a center as a search range, searching a neighborhood voxel of the current voxel, and adding the neighborhood voxel into a neighborhood voxel list of the current voxel; the neighborhood voxel list contains the current voxel;
acquiring the observation self-adaptive semantic fusion state and the offset vector from the neighborhood voxel to the current voxel;
determining a normalized attention weight according to the observation adaptive semantic fusion state and the offset vector;
determining the semantic hidden layer output of the voxel after space-time fusion according to the normalized attention weight; and obtaining a semantic probability prediction vector through a full connection layer and a softmax layer after the hidden layer is output.
An online spatiotemporal semantic fusion system, comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring initial data of an object to be semantically fused and a 2D semantic segmentation network; the initial data comprises point cloud data and an RGB picture;
a data point information determining module, configured to determine information of each data point in the single-frame point cloud data by using the 2D semantic segmentation network with the initial data as input; the information for each of the data points includes: coordinates of the data points in a world coordinate system, image semantic feature vectors corresponding to the data points and semantic label values of the data points;
the three-dimensional voxel grid map building module is used for transforming the single-frame point cloud data into a three-dimensional world coordinate system by taking a voxel as a basic unit and building a three-dimensional voxel grid map by using a dictionary data structure; each voxel in the three-dimensional voxel grid map comprises an image characteristic vector and a label corresponding to the data point;
a voxel data set generating module for generating a voxel data set according to the voxels in the three-dimensional voxel grid map;
the second acquisition module is used for acquiring an online spatiotemporal semantic fusion network;
the semantic fusion module is used for determining a three-dimensional semantic fusion map of the object to be subjected to semantic fusion by using the online spatiotemporal semantic fusion network by taking the image feature vector of the voxel in the voxel data set as input so as to complete the semantic fusion of the object to be subjected to semantic fusion; the three-dimensional semantic fusion map comprises voxel type probability.
Preferably, the three-dimensional voxel grid map building module specifically includes:
a data point data storage unit, configured to store data of data points of a history frame falling in a voxel, where a key of each element in the dictionary data structure is used as a center coordinate of the voxel, and a value of each element is used as a list;
the index result determining unit is used for indexing the dictionary data structure by taking the central coordinate of the voxel of each data point as a key after acquiring new single-frame point cloud data to obtain an index result;
the data point data adding unit is used for adding data of data points to the tail part of the list of the corresponding elements when the index result is that the center coordinate is indexed;
a data point adding unit, configured to create a new element using the voxel center coordinate as a key if the index result is that the index does not reach the center coordinate, and create an empty list to add the data point to the dictionary data structure;
and the three-dimensional voxel grid map determining unit is used for returning to the step of obtaining new single-frame point cloud data, indexing the dictionary data structure by taking the central coordinate of the voxel of each data point as a key, and obtaining an index result until the point cloud data of all the frames are accumulated in the dictionary data structure, so as to obtain the three-dimensional voxel grid map.
Preferably, the system comprises an online spatiotemporal semantic fusion network construction module; the online space-time semantic fusion network construction module comprises:
the acquisition network unit is used for acquiring an observation self-adaptive semantic state updating network and a self-attention information fusion network; the input of the observation self-adaptive semantic state updating network is an image feature vector in the voxel data set, and the output is an observation self-adaptive semantic fusion state of each voxel in time; the input of the self-attention information fusion network is the observation self-adaptive semantic fusion state, and the output is a voxel type prediction vector;
and the online space-time semantic fusion network construction unit is used for constructing the online space-time semantic fusion network according to the observation self-adaptive semantic state updating network and the self-attention information fusion network.
Preferably, the online spatiotemporal semantic fusion network construction module further includes: an observation self-adaptive semantic state updating network construction unit; the observation adaptive semantic state updating network construction unit specifically comprises:
the first acquisition subunit is used for acquiring a normal vector of a voxel in the voxel data set and image semantic features corresponding to historical frame point cloud data in the voxel data set, and taking a central coordinate of the voxel as a sensor pose quantity;
the observation validity state determining subunit is used for obtaining an observation validity state by using the normal vector and the sensor pose as input through a gating circulation unit; the observed validity state is:
Figure GDA0003483551070000061
in the formula (I), the compound is shown in the specification,
Figure GDA0003483551070000062
to observe the significance state, Vi tSensor pose at time t for the ith voxelThe amount of the compound (A) is,
Figure GDA0003483551070000063
the normal vector of the ith voxel at the time t is taken, and GRU is a gating cycle;
the observation self-adaptive semantic fusion state determining subunit is used for determining the observation self-adaptive semantic fusion state of each voxel in time by adopting the gating circulation unit according to the image semantic features and the observation validity state; the observation self-adaptive semantic fusion state is as follows:
Figure GDA0003483551070000064
in the formula, Fi tFor observing the self-adaptive semantic fusion state, GRU is a gating cycle, concatenate is a series operation,
Figure GDA0003483551070000065
in order to be a semantic feature of the image,
Figure GDA0003483551070000066
to observe the validity state.
Preferably, the online spatiotemporal semantic fusion network construction module further includes: a self-attention information fusion network construction unit; the self-attention information fusion network construction unit specifically comprises:
a neighborhood voxel searching subunit, configured to search a neighborhood voxel of a current voxel by using a cube with the current voxel as a center as a search range, and add the neighborhood voxel to a neighborhood voxel list of the current voxel; the neighborhood voxel list contains the current voxel;
the second acquisition subunit is used for acquiring the observation self-adaptive semantic fusion state and the offset vector from the neighborhood voxel to the current voxel;
a normalized attention weight determination subunit, configured to determine a normalized attention weight according to the observation adaptive semantic fusion state and the offset vector;
a semantic hidden layer output determining subunit, configured to determine, according to the normalized attention weight, a semantic hidden layer output after the voxel is subjected to spatio-temporal fusion; and obtaining a semantic probability prediction vector through a full connection layer and a softmax layer after the hidden layer is output.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
according to the online space-time semantic fusion method and system provided by the invention, the high-dimensional semantic features are fused online in space and time by establishing the three-dimensional voxel structure of spatial hash and adopting the online space-time semantic fusion network, so that a better semantic map result is obtained.
In addition, in the online space-time semantic fusion method and system provided by the invention, a networking method based on an attention mechanism is innovatively used, so that the problem that only limited frame data can be traced and historical information cannot be fully utilized in the prior art can be solved, and the online space-time semantic fusion method and system have the characteristics of high efficiency and complete semantic fusion.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart of an online spatiotemporal semantic fusion method provided by the present invention;
FIG. 2 is a diagram of a data transmission process in semantic fusion using the online spatiotemporal semantic fusion method provided by the present invention;
FIG. 3 is a schematic diagram of a specific network structure of an online spatiotemporal semantic fusion network according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of an online spatiotemporal semantic fusion system provided by the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide an online space-time semantic fusion method and system, which can solve the problems that only limited frame data can be traced and historical information cannot be fully utilized in the prior art, and have the characteristics of high efficiency and complete semantic fusion.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
The invention provides an online space-time semantic fusion method which is an attention-based online space-time semantic fusion method. Fig. 1 is a flowchart of an online spatiotemporal semantic fusion method provided by the present invention, and as shown in fig. 1, the online spatiotemporal semantic fusion method includes:
step 100: and acquiring initial data of an object to be semantically fused and a 2D semantic segmentation network. The initial data includes point cloud data and an RGB picture. Wherein the obtained 2D semantic segmentation network is a trained 2D semantic segmentation network.
The invention aims to solve the problem of spatiotemporal semantic fusion in the process of obtaining a real-time semantic map by using 3D scanning (such as laser radar) and RGB video sequence data. Such datasets typically contain point cloud data, RGB pictures and their semantic labels and optionally sensor poses. And training the 2D semantic segmentation network to extract semantic feature information of each frame for the semantic fusion network behind the system.
Step 101: and determining the information of each data point in the single-frame point cloud data by using the initial data as input and adopting a 2D semantic segmentation network. The information for each data point includes: coordinates of the data points in the world coordinate system, image semantic feature vectors corresponding to the data points and semantic label values of the data points.
The RGB image in the initial data is processed through the trained 2D semantic segmentation network, the feature map before the network prediction layer is segmented is up-sampled to the resolution of the original image through bilinear interpolation, and the feature map with the same resolution as the original image is obtained and called as an image semantic feature map.
And projecting the point cloud to a pixel coordinate system of a camera to obtain a pixel coordinate, and performing bilinear interpolation according to the pixel coordinate and the image semantic feature map to obtain an image semantic feature corresponding to the point cloud. And performing proximity interpolation according to the pixel coordinates and the picture labels to obtain semantic labels of the point cloud.
Step 102: and transforming the single-frame point cloud data into a three-dimensional world coordinate system by taking the voxel as a basic unit, and establishing a three-dimensional voxel grid map by using a dictionary data structure. Each voxel in the three-dimensional voxel grid map comprises an image feature vector and a label corresponding to the data point. And taking the class with the largest occurrence number in the labels as the label of the voxel.
Step 102 specifically includes:
and taking the key of each element in the dictionary data structure as the center coordinate of the voxel, taking the value of each element as a list, and storing the data of the data points of the historical frame falling in the voxel.
And after new single-frame point cloud data is obtained, indexing the dictionary data structure by taking the central coordinate of the voxel of each data point as a key to obtain an indexing result.
And if the index result is that the center coordinate is indexed, adding the data of the data point to the tail part of the list of the corresponding element.
And if the index result is that the center coordinate cannot be indexed, creating a new element taking the voxel center coordinate as a key, and creating an empty list to add the data point into a dictionary data structure.
And returning to the step of indexing the dictionary data structure by taking the central coordinate of the voxel of each data point as a key after acquiring new single-frame point cloud data to obtain an indexing result, and obtaining the three-dimensional voxel grid map after the point cloud data of all frames are accumulated to the dictionary data structure.
Step 103: a voxel data set is generated from voxels in the three-dimensional voxel grid map. Namely, storing the image semantic features and semantic labels of the point clouds of the historical frames of all the voxels so as to generate a spatio-temporal semantic fused voxel data set.
Step 104: and acquiring an online space-time semantic fusion network.
Step 105: and determining a three-dimensional semantic fusion map of an object to be semantically fused by using an online space-time semantic fusion network by taking the image feature vector of the voxel in the voxel data set as input. The three-dimensional semantic fusion map comprises voxel class probability.
The data transmission process in the semantic fusion by adopting the online spatiotemporal semantic fusion method provided by the invention is shown in figure 2. As can be seen from fig. 2, the three-dimensional semantic fusion map obtained by the online spatiotemporal semantic fusion method provided by the present invention uses different degrees of gray levels to represent the same voxel class probability (in the actual application process, different colors may be used to represent the same voxel class probability). For different applications of objects to be fused, the divided voxel types are divided according to actual needs, for example, for a three-dimensional semantic fusion map of an outdoor scene, the voxel types can be divided into: buildings, people, greenery, vehicles, and the like.
The specific structure of the online spatiotemporal semantic fusion network is shown in fig. 3. The method mainly comprises two parts, namely an observation self-adaptive semantic state updating network on time and a self-attention information fusion network on space. The construction process comprises the following steps:
and acquiring an observation self-adaptive semantic state updating network and a self-attention information fusion network. The input of the observation self-adaptive semantic state updating network is an image feature vector in a voxel data set, and the output is an observation self-adaptive semantic fusion state of each voxel on time. The input of the self-attention information fusion network is an observation self-adaptive semantic fusion state, and the output is a voxel type prediction vector.
And establishing an online space-time semantic fusion network according to the observation self-adaptive semantic state updating network and the self-attention information fusion network.
The function of the temporal observation adaptive semantic state updating network is to fuse the historical frame semantic feature information of each voxel. Because the change of voxel observation is not uniform in the data collection process, a large amount of redundant or invalid observation is easy to generate, the invention is inspired by attention thought, and the validity of each frame observation is evaluated by using a networking method, so that the state updating mainly focuses on effective observation data rather than redundant data. The concrete structure of the temporal observation adaptive semantic state updating network is shown in part a of fig. 3, and is composed of two network parts, including:
1) and observing the effectiveness evaluation network. The present invention assumes that there are two main factors involved in the effectiveness of the observation. One is the sensor pose with the current voxel as the coordinate center. The other is the normal vector of the current voxel. The combination of normal and position may represent the effectiveness of the observation from a geometric perspective. The observation validity evaluation network uses the two variable factors as input, and uses a Gated RecurrentUnit (GRU) to obtain an observation validity state, specifically:
and acquiring a normal vector of a voxel in the voxel data set and image semantic features corresponding to historical frame point cloud data in the voxel data set, and taking the central coordinate of the voxel as the pose of the sensor.
And a gate control circulation unit (GRU) is adopted, and the observation validity state is obtained by taking the normal vector and the sensor pose as input. The observed validity states are:
Figure GDA0003483551070000111
in the formula (I), the compound is shown in the specification,
Figure GDA0003483551070000112
to observe the significance state, Vi tFor the sensor pose of the ith voxel at time t,
Figure GDA0003483551070000113
GRU is the gating cycle for the normal vector of the ith voxel at time t.
2) And updating the network by the semantic state. The image semantic features and the observation validity state are connected in series, and the observation self-adaptive semantic fusion state of each voxel in time is obtained on line through GRU, specifically:
and determining the observation self-adaptive semantic fusion state of each voxel on time by adopting a gating circulation unit according to the image semantic features and the observation validity state. The observation self-adaptive semantic fusion state is as follows:
Figure GDA0003483551070000114
in the formula, Fi tFor observing the self-adaptive semantic fusion state, GRU is a gating cycle, concatenate is a series operation,
Figure GDA0003483551070000115
in order to be a semantic feature of the image,
Figure GDA0003483551070000116
to observe the validity state.
Performing 3D convolution directly on sparse 3D data is not efficient and the 3D structure may change over time, which makes the fixed mesh network difficult to learn. Thus, the present invention uses a self-attention mechanism to explicitly measure the correlation between the current voxel and its neighborhood voxels. The structure of the spatial self-attention information fusion network is shown in part b of fig. 3, and includes two key parts, including:
1) neighborhood searching:
and taking a cube taking the current voxel as a center as a search range, searching the neighborhood voxel of the current voxel, and adding the neighborhood voxel into a neighborhood voxel list of the current voxel. The neighborhood voxel list contains the current voxel.
2) Fusion network based on spatial self-attention:
the present invention assumes that there are two main factors that determine the correlation between the current voxel and its vicinity. One is the hidden state stored in each voxel and the other is the offset vector of the neighborhood voxels to the current voxel. In consideration of the inconsistency of feature space between the offset vector and the hidden state in the voxel, the invention designs a lightweight encoder to realize space transformation and embed the offset vector. And the offset vectors are subjected to dimensionality enhancement and series connection through a layer of full connection layer, and then are embedded through a layer of full connection layer. The embedded output is split into two branches: one containing only the information of the target voxel to generate the query vector. The other one contains all neighborhood voxels and is input into SENET, generating a weighted feature dictionary of neighborhood voxels. The key uses the same output as the value. In the correlation calculation, the dot product is selected to reduce the computational complexity.
Therefore, after the observation adaptive semantic fusion state and the offset vector from the neighborhood voxel to the current voxel are obtained, the normalized attention weight is determined according to the observation adaptive semantic fusion state and the offset vector. The calculation formula of the normalized correlation weight is as follows:
Figure GDA0003483551070000121
in the formula, Fi tIn order to observe the adaptive semantic fusion state,
Figure GDA0003483551070000122
is the offset vector from the jth neighborhood voxel at time t to the current voxel i. f (-) is a lightweight encoder, f (-)TIs the transpose of the function f (-), S (-) represents SEnet,
Figure GDA0003483551070000123
normalized attention weight at time t for the j-th neighborhood voxel of the current voxel i.
After the normalized correlation weight is obtained, the formula is used for obtaining the normalized correlation weight
Figure GDA0003483551070000124
And calculating to obtain the semantic hidden layer output of the voxel after space-time fusion.
Wherein the content of the first and second substances,
Figure GDA0003483551070000125
and outputting the semantic hidden layer. Further, in FIG. 3
Figure GDA0003483551070000126
Is a bond of an element(s),
Figure GDA0003483551070000127
is the value of the element.
And obtaining semantic probability prediction through a full connection layer and a softmax layer after the hidden layer is output.
In order to provide the accuracy of semantic fusion, the constructed online space-time semantic fusion network is trained and tested. And training and testing the online spatiotemporal semantic fusion network by using the constructed voxel data set. Wherein, training test data is according to 3: 1 division. And the Loss function uses a classical cross entropy function in the semantic segmentation field, and the L2 regular Loss is added to prevent the network overfitting. Training until loss converges, using mIOU (average cross-over ratio) as an evaluation item, and selecting a model with the best effect on a test set as a finally used online space-time semantic fusion network.
Then, the online semantic space-time fusion network can be applied, namely, in the semantic mapping process, the spatial information and RGB semantic features of the point cloud of the historical frame are accumulated, and then the online space-time semantic fusion method provided by the invention is used for fusing the space-time semantic features of the point cloud to obtain a better semantic fusion result online.
Corresponding to the above-mentioned provided method for fusing online spatiotemporal semantics, the present invention also provides an online spatiotemporal semantic fusion system, as shown in fig. 4, the system comprising: a first acquisition module 400, a data point information determination module 401, a three-dimensional voxel grid map creation module 402, a voxel data set generation module 403, a second acquisition module 404, and a semantic fusion module 405.
The first obtaining module 400 is configured to obtain initial data of an object to be semantically fused and a 2D semantic segmentation network. The initial data includes point cloud data and an RGB picture.
The data point information determining module 401 is configured to determine information of each data point in a single frame of point cloud data by using 2D semantic segmentation network with initial data as input. The information for each data point includes: coordinates of the data points in the world coordinate system, image semantic feature vectors corresponding to the data points and semantic label values of the data points.
The three-dimensional voxel grid map building module 402 is configured to transform the single-frame point cloud data into a three-dimensional world coordinate system with voxels as basic units, and build a three-dimensional voxel grid map using a dictionary data structure. Each voxel in the three-dimensional voxel grid map comprises an image feature vector and a label corresponding to the data point.
The voxel data set generating module 403 is used for generating a voxel data set from voxels in the three-dimensional voxel grid map.
The second obtaining module 404 is configured to obtain an online spatiotemporal semantic fusion network.
The semantic fusion module 405 is configured to determine a three-dimensional semantic fusion map of an object to be semantically fused by using an online spatiotemporal semantic fusion network, with the image feature vector of a voxel in the voxel data set as input, so as to complete semantic fusion of the object to be semantically fused. The three-dimensional semantic fusion map comprises voxel class probability.
As another embodiment of the present invention, the three-dimensional voxel grid map building module 402 specifically includes: the device comprises a data point data storage unit, an index result determination unit, a data point data adding unit, a data point adding unit and a three-dimensional voxel grid map determination unit.
The data point data storage unit is used for storing data of data points of historical frames falling in a voxel by taking a key of each element in the dictionary data structure as the center coordinate of the voxel and taking the value of each element as a list.
And the index result determining unit is used for indexing the dictionary data structure by taking the central coordinate of the voxel of each data point as a key after acquiring the new single-frame point cloud data to obtain an index result.
The data point data adding unit is used for adding the data of the data point to the tail part of the list of the corresponding element when the index result is that the index is to the center coordinate.
And the data point adding unit is used for creating a new element taking the voxel center coordinate as a key if the index result is that the center coordinate is not obtained, and creating an empty list to add the data point into the dictionary data structure.
And the three-dimensional voxel grid map determining unit is used for returning to the step of obtaining new single-frame point cloud data, indexing the dictionary data structure by taking the central coordinate of the voxel of each data point as a key to obtain an indexing result until the point cloud data of all frames are accumulated to the dictionary data structure, and obtaining the three-dimensional voxel grid map.
As another embodiment of the invention, the system comprises an online spatiotemporal semantic fusion network construction module. The online space-time semantic fusion network construction module comprises: and acquiring a network unit and an online spatiotemporal semantic fusion network construction unit.
The acquisition network unit is used for acquiring an observation self-adaptive semantic state updating network and a self-attention information fusion network. The input of the observation self-adaptive semantic state updating network is an image feature vector in a voxel data set, and the output is an observation self-adaptive semantic fusion state of each voxel in time. The input of the self-attention information fusion network is an observation self-adaptive semantic fusion state, and the output is a voxel type prediction vector.
The online space-time semantic fusion network construction unit is used for constructing an online space-time semantic fusion network according to the observation self-adaptive semantic state updating network and the self-attention information fusion network.
As another embodiment of the invention, the online spatiotemporal semantic fusion network construction module further comprises: and the observation self-adaptive semantic state updating network construction unit. The observation adaptive semantic state updating network construction unit specifically comprises: the device comprises a first acquisition subunit, an observation validity state determination subunit and an observation adaptive semantic fusion state determination subunit.
The first acquisition subunit is used for acquiring a normal vector of a voxel in the voxel data set and image semantic features corresponding to historical frame point cloud data in the voxel data set, and taking a center coordinate of the voxel as a sensor pose quantity.
And the observation validity state determining subunit is used for obtaining the observation validity state by adopting a gating circulation unit and taking the normal vector and the sensor pose as input. The observed validity states are:
Figure GDA0003483551070000151
in the formula (I), the compound is shown in the specification,
Figure GDA0003483551070000152
to observe the significance state, Vi tFor the sensor pose of the ith voxel at time t,
Figure GDA0003483551070000153
GRU is the gating cycle for the normal vector of the ith voxel at time t.
And the observation self-adaptive semantic fusion state determining subunit is used for determining the observation self-adaptive semantic fusion state of each voxel in time by adopting the gate control circulation unit according to the image semantic features and the observation validity state. The observation self-adaptive semantic fusion state is as follows:
Figure GDA0003483551070000154
in the formula, Fi tFor observing the self-adaptive semantic fusion state, GRU is a gating cycle, concatenate is a series operation,
Figure GDA0003483551070000155
in order to be a semantic feature of the image,
Figure GDA0003483551070000156
to observe the validity state.
As another embodiment of the present invention, the above online spatiotemporal semantic fusion network building module may further include: and the self-attention information fusion network construction unit. The self-attention information fusion network construction unit specifically comprises: the system comprises a neighborhood voxel searching subunit, a second acquiring subunit, a normalized attention weight determining subunit and a semantic hidden layer output determining subunit.
The neighborhood voxel searching subunit is used for searching a neighborhood voxel of the current voxel by taking a cube with the current voxel as a center as a searching range and adding the neighborhood voxel into a neighborhood voxel list of the current voxel. The neighborhood voxel list contains the current voxel.
The second obtaining subunit is used for obtaining an observation self-adaptive semantic fusion state and a shift vector from a neighborhood voxel to a current voxel.
The normalized attention weight determination subunit is used for determining a normalized attention weight according to the observation adaptive semantic fusion state and the offset vector.
And the semantic hidden layer output determining subunit is used for determining the semantic hidden layer output of the voxel after space-time fusion according to the normalized attention weight. And obtaining a semantic probability prediction vector through a full connection layer and a softmax layer after the hidden layer is output.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (2)

1. An online spatiotemporal semantic fusion method, comprising:
acquiring initial data of an object to be semantically fused and a 2D semantic segmentation network; the initial data comprises point cloud data and an RGB picture;
determining the information of each data point in single-frame point cloud data by using the 2D semantic segmentation network by taking the initial data as input; the information for each of the data points includes: coordinates of the data points in a world coordinate system, image semantic feature vectors corresponding to the data points and semantic label values of the data points;
transforming the single-frame point cloud data into a three-dimensional world coordinate system by taking the voxel as a basic unit, and establishing a three-dimensional voxel grid map by using a dictionary data structure; each voxel in the three-dimensional voxel grid map comprises an image characteristic vector and a label corresponding to the data point;
generating a voxel data set from the voxels in the three-dimensional voxel grid map;
acquiring an online spatiotemporal semantic fusion network;
determining a three-dimensional semantic fusion map of the object to be semantically fused by using the online space-time semantic fusion network by taking the image feature vector of the voxel in the voxel data set as input; the three-dimensional semantic fusion map comprises voxel type probability;
the method comprises the following steps of transforming single-frame point cloud data into a three-dimensional world coordinate system by taking a voxel as a basic unit, and establishing a three-dimensional voxel grid map by using a dictionary data structure, wherein the method specifically comprises the following steps:
taking a key of each element in the dictionary data structure as a center coordinate of a voxel, taking a value of each element as a list, and storing data of data points of historical frames falling in the voxel;
after new single-frame point cloud data are obtained, the center coordinates of voxels falling on each data point are used as keys to index the dictionary data structure, and an index result is obtained;
if the index result is that the center coordinate is indexed, adding data of the data point to the tail part of the list of the corresponding element;
if the index result is that the center coordinate cannot be indexed, creating a new element taking the voxel center coordinate as a key, and creating an empty list to add the data point into the dictionary data structure;
returning to the step of obtaining new single-frame point cloud data, indexing the dictionary data structure by taking the central coordinate of the voxel of each data point as a key to obtain an index result, and obtaining a three-dimensional voxel grid map after the point cloud data of all frames are accumulated to the dictionary data structure;
the construction process of the online spatiotemporal semantic fusion network comprises the following steps:
acquiring an observation self-adaptive semantic state updating network and a self-attention information fusion network; the input of the observation self-adaptive semantic state updating network is an image feature vector in the voxel data set, and the output is an observation self-adaptive semantic fusion state of each voxel in time; the input of the self-attention information fusion network is the observation self-adaptive semantic fusion state, and the output is a voxel type prediction vector;
constructing the online spatiotemporal semantic fusion network according to the observation adaptive semantic state updating network and the self-attention information fusion network;
the construction process of the observation self-adaptive semantic state updating network comprises the following steps:
acquiring a normal vector of a voxel in the voxel data set and image semantic features corresponding to historical frame point cloud data in the voxel data set, and taking a central coordinate of the voxel as a sensor pose;
a gating circulation unit is adopted, and the normal vector and the sensor pose are used as input to obtain an observation effective state; the observed validity state is:
Figure FDA0003483551060000021
in the formula (I), the compound is shown in the specification,
Figure FDA0003483551060000022
in order to observe the state of validity,
Figure FDA0003483551060000023
for the sensor pose of the ith voxel at time t,
Figure FDA0003483551060000024
the normal vector of the ith voxel at the time t is taken, and GRU is a gating cycle;
according to the image semantic features and the observation validity state, determining an observation self-adaption semantic fusion state of each voxel on time by adopting the gating circulation unit; the observation self-adaptive semantic fusion state is as follows:
Figure FDA0003483551060000025
in the formula (I), the compound is shown in the specification,
Figure FDA0003483551060000026
for observing the self-adaptive semantic fusion state, GRU is a gating cycle, concatenate is a series operation,
Figure FDA0003483551060000027
in order to be a semantic feature of the image,
Figure FDA0003483551060000028
to observe the validity state;
the construction process of the self-attention information fusion network comprises the following steps:
taking a cube taking a current voxel as a center as a search range, searching a neighborhood voxel of the current voxel, and adding the neighborhood voxel into a neighborhood voxel list of the current voxel; the neighborhood voxel list contains the current voxel;
acquiring the observation self-adaptive semantic fusion state and the offset vector from the neighborhood voxel to the current voxel;
determining a normalized attention weight according to the observation adaptive semantic fusion state and the offset vector;
determining the semantic hidden layer output of the voxel after space-time fusion according to the normalized attention weight; and obtaining a semantic probability prediction vector through a full connection layer and a softmax layer after the hidden layer is output.
2. An online spatiotemporal semantic fusion system, comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring initial data of an object to be semantically fused and a 2D semantic segmentation network; the initial data comprises point cloud data and an RGB picture;
a data point information determining module, configured to determine information of each data point in the single-frame point cloud data by using the 2D semantic segmentation network with the initial data as input; the information for each of the data points includes: coordinates of the data points in a world coordinate system, image semantic feature vectors corresponding to the data points and semantic label values of the data points;
the three-dimensional voxel grid map building module is used for transforming the single-frame point cloud data into a three-dimensional world coordinate system by taking a voxel as a basic unit and building a three-dimensional voxel grid map by using a dictionary data structure; each voxel in the three-dimensional voxel grid map comprises an image characteristic vector and a label corresponding to the data point;
a voxel data set generating module for generating a voxel data set according to the voxels in the three-dimensional voxel grid map;
the second acquisition module is used for acquiring an online spatiotemporal semantic fusion network;
the semantic fusion module is used for determining a three-dimensional semantic fusion map of the object to be subjected to semantic fusion by using the online spatiotemporal semantic fusion network by taking the image feature vector of the voxel in the voxel data set as input so as to complete the semantic fusion of the object to be subjected to semantic fusion; the three-dimensional semantic fusion map comprises voxel type probability;
the three-dimensional voxel grid map building module specifically comprises:
a data point data storage unit, configured to store data of data points of a history frame falling in a voxel, where a key of each element in the dictionary data structure is used as a center coordinate of the voxel, and a value of each element is used as a list;
the index result determining unit is used for indexing the dictionary data structure by taking the central coordinate of the voxel of each data point as a key after acquiring new single-frame point cloud data to obtain an index result;
the data point data adding unit is used for adding data of data points to the tail part of the list of the corresponding elements when the index result is that the center coordinate is indexed;
a data point adding unit, configured to create a new element using the voxel center coordinate as a key if the index result is that the index does not reach the center coordinate, and create an empty list to add the data point to the dictionary data structure;
the three-dimensional voxel grid map determining unit is used for returning to the step that after new single-frame point cloud data are obtained, the center coordinates of the voxels where each data point falls are used as keys to index the dictionary data structure, and index results are obtained until all frames of point cloud data are accumulated in the dictionary data structure, and then the three-dimensional voxel grid map is obtained;
the system comprises an online time-space semantic fusion network construction module; the online space-time semantic fusion network construction module comprises:
the acquisition network unit is used for acquiring an observation self-adaptive semantic state updating network and a self-attention information fusion network; the input of the observation self-adaptive semantic state updating network is an image feature vector in the voxel data set, and the output is an observation self-adaptive semantic fusion state of each voxel in time; the input of the self-attention information fusion network is the observation self-adaptive semantic fusion state, and the output is a voxel type prediction vector;
the online space-time semantic fusion network construction unit is used for constructing the online space-time semantic fusion network according to the observation self-adaptive semantic state updating network and the self-attention information fusion network;
the online spatiotemporal semantic fusion network construction module further comprises: an observation self-adaptive semantic state updating network construction unit; the observation adaptive semantic state updating network construction unit specifically comprises:
the first acquisition subunit is used for acquiring a normal vector of a voxel in the voxel data set and image semantic features corresponding to historical frame point cloud data in the voxel data set, and taking a central coordinate of the voxel as a sensor pose quantity;
the observation validity state determining subunit is used for obtaining an observation validity state by using the normal vector and the sensor pose as input through a gating circulation unit; the observed validity state is:
Figure FDA0003483551060000051
in the formula (I), the compound is shown in the specification,
Figure FDA0003483551060000052
in order to observe the state of validity,
Figure FDA0003483551060000053
for the sensor pose of the ith voxel at time t,
Figure FDA0003483551060000054
the normal vector of the ith voxel at the time t is taken, and GRU is a gating cycle;
the observation self-adaptive semantic fusion state determining subunit is used for determining the observation self-adaptive semantic fusion state of each voxel in time by adopting the gating circulation unit according to the image semantic features and the observation validity state; the observation self-adaptive semantic fusion state is as follows:
Figure FDA0003483551060000055
in the formula, Fi tFor observing the self-adaptive semantic fusion state, GRU is a gating cycle, concatenate is a series operation,
Figure FDA0003483551060000056
in order to be a semantic feature of the image,
Figure FDA0003483551060000057
to observe the validity state;
the online spatiotemporal semantic fusion network construction module further comprises: a self-attention information fusion network construction unit; the self-attention information fusion network construction unit specifically comprises:
a neighborhood voxel searching subunit, configured to search a neighborhood voxel of a current voxel by using a cube whose center is the current voxel as a search range, and add the neighborhood voxel to a neighborhood voxel list of the current voxel; the neighborhood voxel list contains the current voxel;
the second acquisition subunit is used for acquiring the observation self-adaptive semantic fusion state and the offset vector from the neighborhood voxel to the current voxel;
a normalized attention weight determination subunit, configured to determine a normalized attention weight according to the observation adaptive semantic fusion state and the offset vector;
a semantic hidden layer output determining subunit, configured to determine, according to the normalized attention weight, a semantic hidden layer output after the voxel is subjected to spatio-temporal fusion; and obtaining a semantic probability prediction vector through a full connection layer and a softmax layer after the hidden layer is output.
CN202010418823.1A 2020-05-18 2020-05-18 Online spatiotemporal semantic fusion method and system Active CN111462324B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010418823.1A CN111462324B (en) 2020-05-18 2020-05-18 Online spatiotemporal semantic fusion method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010418823.1A CN111462324B (en) 2020-05-18 2020-05-18 Online spatiotemporal semantic fusion method and system

Publications (2)

Publication Number Publication Date
CN111462324A CN111462324A (en) 2020-07-28
CN111462324B true CN111462324B (en) 2022-05-17

Family

ID=71682793

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010418823.1A Active CN111462324B (en) 2020-05-18 2020-05-18 Online spatiotemporal semantic fusion method and system

Country Status (1)

Country Link
CN (1) CN111462324B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112819080B (en) * 2021-02-05 2022-09-02 四川大学 High-precision universal three-dimensional point cloud identification method
CN112927234A (en) * 2021-02-25 2021-06-08 中国工商银行股份有限公司 Point cloud semantic segmentation method and device, electronic equipment and readable storage medium
CN112837372A (en) * 2021-03-02 2021-05-25 浙江商汤科技开发有限公司 Data generation method and device, electronic equipment and storage medium
CN113516750B (en) * 2021-06-30 2022-09-27 同济大学 Three-dimensional point cloud map construction method and system, electronic equipment and storage medium
CN114638954B (en) * 2022-02-22 2024-04-19 深圳元戎启行科技有限公司 Training method of point cloud segmentation model, point cloud data segmentation method and related device
CN117132727B (en) * 2023-10-23 2024-02-06 光轮智能(北京)科技有限公司 Map data acquisition method, computer readable storage medium and electronic device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110243370A (en) * 2019-05-16 2019-09-17 西安理工大学 A kind of three-dimensional semantic map constructing method of the indoor environment based on deep learning
CN110245709A (en) * 2019-06-18 2019-09-17 西安电子科技大学 Based on deep learning and from the 3D point cloud data semantic dividing method of attention
CN111127493A (en) * 2019-11-12 2020-05-08 中国矿业大学 Remote sensing image semantic segmentation method based on attention multi-scale feature fusion

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107730503B (en) * 2017-09-12 2020-05-26 北京航空航天大学 Image object component level semantic segmentation method and device embedded with three-dimensional features
US11004202B2 (en) * 2017-10-09 2021-05-11 The Board Of Trustees Of The Leland Stanford Junior University Systems and methods for semantic segmentation of 3D point clouds

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110243370A (en) * 2019-05-16 2019-09-17 西安理工大学 A kind of three-dimensional semantic map constructing method of the indoor environment based on deep learning
CN110245709A (en) * 2019-06-18 2019-09-17 西安电子科技大学 Based on deep learning and from the 3D point cloud data semantic dividing method of attention
CN111127493A (en) * 2019-11-12 2020-05-08 中国矿业大学 Remote sensing image semantic segmentation method based on attention multi-scale feature fusion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Human Pose Estimation using Deep Structure Guided Learning;Baole Ai, Yu Zhou, Yao Yu, Sidan Du;《2017 IEEE Winter Conference on Applications of Computer Vision》;20171231;全文 *
基于注意力机制与知识蒸馏的目标细分类与检测;管文杰;《中国优秀硕士学位论文全文数据库 信息科技辑》;20190715;第I138-1106页 *

Also Published As

Publication number Publication date
CN111462324A (en) 2020-07-28

Similar Documents

Publication Publication Date Title
CN111462324B (en) Online spatiotemporal semantic fusion method and system
US20220222920A1 (en) Content processing method and apparatus, computer device, and storage medium
CN110135319B (en) Abnormal behavior detection method and system
CN109086683B (en) Human hand posture regression method and system based on point cloud semantic enhancement
CN110781262B (en) Semantic map construction method based on visual SLAM
CN111242844B (en) Image processing method, device, server and storage medium
CN113870422B (en) Point cloud reconstruction method, device, equipment and medium
CN112364931A (en) Low-sample target detection method based on meta-feature and weight adjustment and network model
CN114332578A (en) Image anomaly detection model training method, image anomaly detection method and device
CN112884742A (en) Multi-algorithm fusion-based multi-target real-time detection, identification and tracking method
CN116524419B (en) Video prediction method and system based on space-time decoupling and self-attention difference LSTM
CN111368733B (en) Three-dimensional hand posture estimation method based on label distribution learning, storage medium and terminal
CN116612468A (en) Three-dimensional target detection method based on multi-mode fusion and depth attention mechanism
CN113554039A (en) Method and system for generating optical flow graph of dynamic image based on multi-attention machine system
CN110111365B (en) Training method and device based on deep learning and target tracking method and device
CN112668662B (en) Outdoor mountain forest environment target detection method based on improved YOLOv3 network
CN112241802A (en) Interval prediction method for wind power
CN116519106B (en) Method, device, storage medium and equipment for determining weight of live pigs
CN116863241A (en) End-to-end semantic aerial view generation method, model and equipment based on computer vision under road scene
CN117197632A (en) Transformer-based electron microscope pollen image target detection method
CN116453025A (en) Volleyball match group behavior identification method integrating space-time information in frame-missing environment
CN115861384A (en) Optical flow estimation method and system based on generation of countermeasure and attention mechanism
Li et al. An enhanced squeezenet based network for real-time road-object segmentation
Lai et al. 3D semantic map construction system based on visual SLAM and CNNs
Ruan et al. A semantic octomap mapping method based on cbam-pspnet

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant