CN117274756A - Fusion method and device of two-dimensional image and point cloud based on multi-dimensional feature registration - Google Patents

Fusion method and device of two-dimensional image and point cloud based on multi-dimensional feature registration Download PDF

Info

Publication number
CN117274756A
CN117274756A CN202311103838.9A CN202311103838A CN117274756A CN 117274756 A CN117274756 A CN 117274756A CN 202311103838 A CN202311103838 A CN 202311103838A CN 117274756 A CN117274756 A CN 117274756A
Authority
CN
China
Prior art keywords
point cloud
dimensional
dimensional image
feature
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311103838.9A
Other languages
Chinese (zh)
Inventor
郑文杰
杨祎
张峰达
李壮壮
辜超
朱文兵
林颖
刘萌
崔其会
李勇
乔木
任敬国
孙艺玮
吕俊涛
邢海文
李程启
李笋
李文博
顾朝亮
李龙龙
师伟
李�杰
朱庆东
张丕沛
伊峰
高志新
许伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
Original Assignee
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd filed Critical Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
Priority to CN202311103838.9A priority Critical patent/CN117274756A/en
Publication of CN117274756A publication Critical patent/CN117274756A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a fusion method of a two-dimensional image and a point cloud based on multidimensional feature registration, which comprises the following steps: preprocessing two-dimensional images and three-dimensional point cloud data; performing feature extraction by adopting a convolutional neural network; adopting convolution kernel full-connection layer fusion to obtain shared characteristic representation; constructing a deep learning model: the deep learning model comprises an encoder and a decoder, wherein the encoder mainly comprises a convolution layer and a pooling layer, and the convolution layer comprises a plurality of convolution kernels with different scales; the full connection layer generates output characteristics; the decoder predicts a semantic class for each pixel; model training and optimization to obtain a prediction model; inputting the two-dimensional image of the object to be processed and the corresponding three-dimensional point cloud data into a prediction model to obtain fusion information of the object to be processed. The invention has the beneficial effects of improving the fusion precision and meeting the high-precision fusion requirement. The invention also provides a fusion device which comprises a preprocessing module, a registration projection module, a feature extraction module, a feature fusion prediction module and an output module.

Description

Fusion method and device of two-dimensional image and point cloud based on multi-dimensional feature registration
Technical Field
The invention relates to the technical field of fusion point clouds. More particularly, the invention relates to a fusion method and a fusion device of a two-dimensional image and a point cloud based on multi-dimensional feature registration.
Background
Although the three-dimensional point cloud data has real three-dimensional coordinates, the three-dimensional point cloud data lacks real texture information, and on the premise of lacking priori knowledge, the human eyes have very limited identification capability for the three-dimensional point cloud. The digital image has rich texture information, has imaging effect similar to that seen by human eyes, accords with the cognition of human to the real world, and has the defect that the three-dimensional real world cannot be directly represented by two dimensions. It is assumed that if the three-dimensional point cloud and the two-dimensional digital image can be fused, the comprehensive information of the target surface can be obtained, including the real three-dimensional coordinates of the surface of the measured object and rich texture information. The current fusion techniques are as follows:
1. the three-dimensional point cloud is aligned with the two-dimensional image: the three-dimensional point cloud and the two-dimensional image are typically represented by different coordinate systems, so they first need to be aligned into shared coordinates. The conversion of image coordinates into point cloud coordinates may be accomplished by camera calibration and perspective geometry techniques, such as using a camera projection model.
2. Three-dimensional point cloud projection: projecting the three-dimensional point cloud onto the two-dimensional image plane may be accomplished by projecting three-dimensional points in the three-dimensional point cloud onto corresponding two-dimensional image locations. The projection calculation may be performed using camera parameters and geometric relationships. After projection, attributes (such as color and texture) of the three-dimensional point cloud can be corresponding to the image pixels, so that fusion of the three-dimensional point cloud and the two-dimensional image is realized.
3. Feature extraction of a three-dimensional point cloud and a two-dimensional image: features can be extracted from the three-dimensional point cloud and the two-dimensional image respectively, and fusion can be performed by matching the features. For example, conventional feature extraction algorithms (e.g., SISF, SURF, etc.) are used in the image to monitor key points and descriptors, and then corresponding projection techniques are used in the point cloud to map these features onto corresponding three-dimensional points, thereby establishing correspondence between the image and the point cloud.
4. Deep learning fusion of a three-dimensional point cloud and a two-dimensional image: the deep learning method has made great progress in both two-dimensional image processing and three-dimensional point cloud analysis. The two-dimensional image and the three-dimensional point cloud data can be simultaneously processed by using the deep learning model, so that fusion of the two-dimensional image and the three-dimensional point cloud data is realized. For example, a neural network model may be designed. The method can accept two-dimensional images and three-dimensional point clouds as input, and jointly learn the relationship between two-dimensional and three-dimensional features to solve various tasks, such as target detection, scene understanding and the like.
5. Sensor fusion: in some cases, multiple sensors (e.g., cameras, lidars, etc.) may be used to simultaneously acquire two-dimensional images and three-dimensional point cloud data and fuse them. Such as by using a sensor fusion algorithm, such as an Extended Kalman Filter (EKF) or a Particle Filter (PF), to obtain more accurate and complete scene information.
Then, the current two-dimensional image and three-dimensional point cloud fusion technology faces the following defects:
1. data inconsistency: the two-dimensional image and the three-dimensional point cloud are different in data representation modes, the two-dimensional image is represented in units of pixels, and the three-dimensional point cloud is represented in coordinates and attribute information of points. This creates data inconsistencies that require conversion and alignment of the data formats for efficient fusion;
2. data registration problem: fusing a two-dimensional image and a three-dimensional point cloud requires registration of the data, i.e., projecting the two-dimensional image into a three-dimensional space or mapping the three-dimensional point cloud back to the two-dimensional image plane. However, the registration process may introduce errors, resulting in inaccuracy in the fusion result. Solving the problem requires the adoption of an accurate sensor calibration and registration algorithm;
3. data sparsity and noise: both two-dimensional image and three-dimensional point cloud data have sparsity and noise of the data. Two-dimensional images may have problems of occlusion, illumination variation, texture blurring, etc., and three-dimensional point clouds may have missing points and noise points. These problems affect the quality and accuracy of the fusion result, requiring data processing and noise filtering operations;
4. computational complexity: the processing of two-dimensional images and three-dimensional point clouds has different computational complexity. The processing of two-dimensional images generally adopts a convolutional neural network-based method. Therefore, the problem of unmatched computation complexity is required to be solved by effectively fusing the two-dimensional image and the three-dimensional point cloud;
5. cross-modal information fusion: the problem of cross-modal information fusion is to be considered in fusing the two-dimensional image and the three-dimensional point cloud, namely, how to effectively fuse color, texture and shape information in the two-dimensional image with the aggregate structure and attribute information in the three-dimensional point cloud;
in summary, the current fusion method is poor in precision when being singly used, and cannot meet the requirements of some high-precision tasks.
Disclosure of Invention
It is an object of the present invention to solve at least the above problems and to provide at least the advantages to be described later.
To achieve these objects and other advantages and in accordance with the purpose of the invention, a fusion method of a two-dimensional image and a point cloud based on multi-dimensional feature registration is provided, comprising the steps of:
s1, acquiring a two-dimensional image of an object and corresponding three-dimensional point cloud data, and respectively preprocessing;
s2, registering: converting the preprocessed two-dimensional image and the three-dimensional point cloud data into the same coordinate system, and establishing spatial position correlation between the two-dimensional image and the three-dimensional point cloud;
s3, point cloud projection: projecting the registered three-dimensional point cloud data into a corresponding two-dimensional image space;
s4, feature extraction: respectively extracting features of the two-dimensional image and the three-dimensional point cloud data with the mutual spatial position association relation by adopting a convolutional neural network to obtain corresponding feature representation;
s5, fusing the feature representation in the step S4 by adopting a convolution kernel full-connection layer to obtain a shared feature representation;
s6, constructing a deep learning model: the deep learning model comprises an encoder and a decoder, wherein the encoder mainly comprises a convolution layer and a pooling layer, the convolution layer comprises a plurality of convolution kernels with different scales, each convolution kernel carries out convolution operation with input data, space structure information is reserved and local features are extracted at the same time, the convolution kernels are weights in the convolution operation, the convolution operation carries out sliding window calculation on the input data, a specific feature map is generated, a nonlinear activation function is adopted to activate the feature map, and nonlinear features are introduced; the pooling layer downsamples the feature map;
after being processed by a convolution layer and a pooling layer for many times, the input shared characteristic representation and the weight are subjected to matrix multiplication and nonlinear change by adopting a full-connection layer to generate output characteristics;
the decoder mainly comprises a full-time convolution layer and a convolution layer, wherein the full-time convolution layer carries out up-sampling on output characteristics, increases resolution, carries out classification prediction through the convolution layer, outputs the same size as an input image, and predicts semantic categories for each pixel;
s7, model training and optimization: training the deep learning model constructed in the step S6 by adopting a labeled training data set, performing feature descent optimization according to a loss function, and obtaining a prediction model after iterative training and verification;
s8, processing the two-dimensional image of the object to be processed and the corresponding three-dimensional point cloud data through the steps S1-S5, and inputting the processed two-dimensional image and the processed three-dimensional point cloud data into the prediction model in the step S7 to obtain fusion information of the object to be processed.
Preferably, the pretreatment method in step S1 specifically includes: preprocessing a two-dimensional image, including scaling to a target size, normalizing the pixel range of the two-dimensional image to 0-1, and cutting the two-dimensional image to obtain an interested region;
denoising the three-dimensional point cloud, selecting a filter type, setting filter parameters for filtering, selecting a resampling method, setting resampling parameters for processing the filtered three-dimensional point cloud data.
Preferably, interpolation algorithms are used to process differences between image pixels during scaling. .
Preferably, the normalization method is to divide the pixel value by 255 such that the pixel ranges from 0 to 255 to between 0 and 1.
Preferably, the cropping criteria are defined in terms of the position of the image, the pixel values or the bounding box.
Preferably, the filter type includes one of an average filter, a median filter, and a gaussian filter.
Preferably, the resampling method comprises one of voxel gridding, nearest neighbor sampling, and surface-based sampling.
Preferably, the pooling layer adopts a maximum pooling method, and selects the maximum value of a certain area in the feature map as the feature after downsampling.
Preferably, the activation function is a softmax function, generating a probability distribution for each category.
The device for the fusion method of the two-dimensional image and the point cloud based on the multi-dimensional feature registration comprises the following steps:
the preprocessing module is used for acquiring a two-dimensional image of the object and corresponding three-dimensional point cloud data and respectively preprocessing the two-dimensional image and the corresponding three-dimensional point cloud data;
the registration projection module is used for converting the preprocessed two-dimensional image and the three-dimensional point cloud data into the same coordinate system and establishing spatial position association between the two-dimensional image and the three-dimensional point cloud; and the three-dimensional point cloud data after registration are projected into a corresponding two-dimensional image space;
the feature extraction module is used for respectively carrying out feature extraction on the two-dimensional image and the three-dimensional point cloud data with the mutual spatial position association relation by adopting a convolutional neural network to obtain corresponding feature representation;
the feature fusion prediction module is used for constructing a deep learning model, the deep learning model comprises an encoder and a decoder, the encoder mainly comprises a convolution layer and a pooling layer, the convolution layer comprises a plurality of convolution kernels with different scales, each convolution kernel carries out convolution operation with input data, spatial structure information is reserved, local features are extracted at the same time, the convolution kernels are weights in the convolution operation, the convolution operation carries out sliding window calculation on the input data, a specific feature map is generated, a nonlinear activation function is adopted to activate the feature map, and nonlinear features are introduced; the pooling layer downsamples the feature map;
after being processed by a convolution layer and a pooling layer for many times, the input shared characteristic representation and the weight are subjected to matrix multiplication and nonlinear change by adopting a full-connection layer to generate output characteristics;
the decoder mainly comprises a full-time convolution layer and a convolution layer, wherein the full-time convolution layer carries out up-sampling on output characteristics, increases resolution, carries out classification prediction through the convolution layer, outputs the same size as an input image, and predicts semantic categories for each pixel;
the model training and optimizing method comprises the steps of (1) training a deep learning model constructed in the step (S6) by adopting a labeled training data set, performing feature descent optimization according to a loss function, and obtaining a prediction model after iterative training and verification;
the output module is used for inputting the two-dimensional image of the object to be processed and the corresponding three-dimensional point cloud data into the prediction model in the step S7 after the two-dimensional image of the object to be processed and the corresponding three-dimensional point cloud data are processed in the steps S1 to S5, and obtaining fusion information of the object to be processed.
The invention at least comprises the following beneficial effects: based on the deep learning of the point cloud and the image, a network fusion model concept is introduced, an end-to-end deep learning network is designed, two-dimensional image and three-dimensional point cloud data are taken as input, and the two-dimensional image and the three-dimensional point cloud data are combined through network layer fusion operation, so that an accurate mapping relation between a visible light picture and the three-dimensional point cloud is obtained, and a more accurate and robust two-dimensional image and three-dimensional point cloud fusion result is obtained.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.
Drawings
FIG. 1 is a block diagram of a fusion method of the present invention.
Detailed Description
The present invention is described in further detail below with reference to the drawings to enable those skilled in the art to practice the invention by referring to the description.
As shown in fig. 1, the invention provides a fusion method of a two-dimensional image and a point cloud based on multi-dimensional feature registration, which comprises the following steps:
s1, acquiring a two-dimensional image of an object and corresponding three-dimensional point cloud data, and respectively preprocessing; the pretreatment method specifically comprises the following steps: preprocessing a two-dimensional image, including scaling to a target size, normalizing the pixel range of the two-dimensional image to 0-1, and cutting the two-dimensional image to obtain an interested region;
wherein the scaling is to adjust the size of the image to different sizes. The scaling may be set as desired. Typically, a scale factor of less than 1 may be used for the reduced image, and a scale factor of greater than 1 may be used for the enlarged image. The specific scaling factor depends on the original size of the image and the target size. For example, if the original image size is 500×500 pixels, we can reduce it to half the original size, i.e., 250×250 pixels. Similarly, if we want to zoom in the image to twice the original size, it can be adjusted to 1000×1000 pixels. Interpolation algorithms (such as nearest neighbor interpolation, bilinear interpolation, or bicubic interpolation) may be used to handle differences between pixels during scaling.
Normalization is the mapping of a range of pixel values of an image to a specific range (typically 0-1). The purpose of this is to eliminate the difference in pixel values between different images, so that it has a similar scale, which is easier to compare and process. One normalization method that may be employed is to divide the pixel value by 255, mapping the pixel range from 0 to 255 to between 0 and 1.
Cropping refers to selecting a region of interest from an image according to a particular criteria, and retaining it, removing other regions. The clipping criteria can be defined according to different requirements and applications. For example, clipping criteria may be defined based on the location of the image, the pixel values, or the bounding box. Thus, an image with a specific area can be acquired, and subsequent processing or analysis is facilitated.
Denoising the three-dimensional point cloud, selecting a filter type, setting filter parameters for filtering, selecting a resampling method, setting resampling parameters for processing the filtered three-dimensional point cloud data.
The general steps of filtering include: defining a filter: appropriate filter types and parameters are selected. Common filter types include mean filters, median filters, gaussian filters, etc. The choice of filter depends on the specific application requirements and image characteristics.
Three-dimensional point cloud resampling is to resample three-dimensional point cloud data to adjust the density, resolution, or remove unwanted noise points of the point cloud. And new three-dimensional point cloud data can be obtained after resampling.
In the three-dimensional point cloud resampling process, different algorithms and strategies can be adopted for processing. Resampling methods include voxel gridding, nearest neighbor sampling, surface-based sampling, and the like. The data output after sampling is still three-dimensional point cloud data, but the characteristics of the data are different according to the resampling target and algorithm. For example, if resampling is to adjust the density of the point cloud, the output point cloud may have fewer or more points, and the density of the point cloud may be spatially uniform and non-uniform. If resampling is to remove noise, the output point cloud may reduce the number of noise points or adjust the shape of the point cloud to better represent the real object.
The resampled three-dimensional point cloud data may be used for further three-dimensional point cloud processing and applications such as target recognition, point cloud registration, modeling, and the like. The output data is still a three-dimensional point cloud, but its structure and characteristics will vary depending on the resampling method and parameters. Depending on the resampling algorithm employed and the parameter settings in the process.
S2, registering: converting the preprocessed two-dimensional image and the three-dimensional point cloud data into the same coordinate system, and establishing spatial position correlation between the two-dimensional image and the three-dimensional point cloud;
s3, point cloud projection: projecting the registered three-dimensional point cloud data into a corresponding two-dimensional image space;
s4, feature extraction: respectively extracting features of the two-dimensional image and the three-dimensional point cloud data with the mutual spatial position association relation by adopting a convolutional neural network to obtain corresponding feature representation;
the specific steps of feature extraction include:
data preparation: first, an original dataset needs to be prepared, which may be in the form of image, text, in, etc. Ensuring the quality and applicability of the data set.
Feature selection: before feature extraction, feature selection may be performed. Feature selection refers to selecting or screening out the most distinguishing and important features from the original dataset. Thus, the dimension of the feature can be reduced, the calculation efficiency is improved, and the influence of honor or noise features is avoided.
The feature extraction method comprises the following steps: and respectively carrying out feature extraction on the two-dimensional image and the three-dimensional point cloud data with the mutual spatial position association relation by adopting a convolutional neural network to obtain corresponding feature representation. An algorithm such as SISF, SURF, FAST is used to extract the features. The characteristic is represented as follows: the extracted features need to be appropriately represented for subsequent processing and analysis. The vectors are used for representing the features, so that the consistency and the comparability of the features are ensured.
Feature evaluation and selection: after extracting the features, evaluation and selection of the features is performed to understand the quality of the features and their contribution to the task. The importance of the feature is assessed using indicators of relevance, information gain, variance, etc.
Characteristic pretreatment: and carrying out necessary characteristic preprocessing operation according to task requirements and characteristic properties. For example, normalization, dimension reduction, etc., operations may further optimize the representation and processing effects.
Wherein, the following features are extracted: the structural characteristics are as follows: structural information for expressing and representing data, such as aggregate shape, topology and structure, hierarchical relationship, and the like. In image processing, the structural features may be edges, corner points, lines, etc. Statistical characteristics: features extracted based on statistical properties of the data include mean, variance, standard deviation, maximum and minimum, histogram, etc. Statistical features are often used to describe the distribution and overall characteristics of data. Visual characteristics: feature extraction for image and video data, including color features, texture features, shape features, SIFT (scale invariant feature transform) descriptors, HOG (histogram of directional gradients), and the like.
S5, fusing the feature representation in the step S4 by adopting a convolution kernel full-connection layer to obtain a shared feature representation; after serial fusion connection, the following can be obtained: data/feature fusion results, feature representation enhancement, decision result fusion and information fusion results.
The convolution kernel full-connection layer specifically comprises the following steps:
input: first, a data set of a two-dimensional image and a three-dimensional point cloud (the results of the feature extraction of the former two) is prepared. The image is represented as a matrix and the three-dimensional point cloud is represented as three-dimensional coordinate information and possibly additional attributes.
Feature extraction and pretreatment: for two-dimensional images and three-dimensional point clouds, different feature extraction and preprocessing methods are applied. The image is subjected to characteristic extraction by using a Convolutional Neural Network (CNN), and the three-dimensional point cloud is subjected to characteristic extraction by using a point cloud specific algorithm (Pointnet++). Local and global information of the image and the three-dimensional point cloud can be captured, and corresponding feature representations are generated.
Feature fusion: after the feature extraction is completed, a feature fusion operation may be performed. And (3) connecting or superposing the image and the characteristics of the three-dimensional point cloud in series. The specific operation is to connect the image and the point cloud feature into a larger feature vector or to merge the image and the point cloud feature into a shared feature representation through a convolution kernel full-connection layer, wherein the shared feature representation is selected at this time.
S6, constructing a deep learning model: next, a deep learning model is constructed to learn the fused feature expressions and relationships. In the deep learning model, the convolution kernel is used for extracting the space and channel level of the features, and the full connection layer is used for mapping and combining the features. The convolution layer carries out convolution operation on input through a sliding window, and local features are extracted. The full connection layer performs matrix multiplication and nonlinear change on the input features and the weights to generate output features.
The deep learning model comprises an encoder and a decoder, wherein the encoder mainly comprises a convolution layer and a pooling layer, the convolution layer comprises a plurality of convolution kernels with different scales, each convolution kernel carries out convolution operation with input data, space structure information is reserved and local features are extracted at the same time, the convolution kernels are weights in the convolution operation, the convolution operation carries out sliding window calculation on the input data, a specific feature map is generated, a nonlinear activation function is adopted to activate the feature map, and nonlinear features are introduced; the pooling layer downsamples the feature map; wherein, convolution kernels with different scales are introduced into the convolution neural network to process receptive fields with different sizes. Such as convolution kernels of different sizes or convolution using holes of multiple dimensions. This captures details and context information at multiple scales and fuses them together to improve feature expressivity.
After being processed by a convolution layer and a pooling layer for many times, the input shared characteristic representation and the weight are subjected to matrix multiplication and nonlinear change by adopting a full-connection layer to generate output characteristics;
in this application, the encoder basic steps include:
input layer: an input of a shared feature representation is accepted.
Convolution layer: the convolutional layer is the core part of the CNN, and consists of a plurality of convolutional kernels. Each convolution kernel performs convolution operation with the input data to extract local features. The convolution operation calculates on the input data through a sliding window, generating a feature map.
Activation function: after the convolutional layer, the feature map is activated using a nonlinear activation function, such as ReLU (Rectified Linear Unit), to introduce nonlinear features.
Pooling layer: the pooling layer reduces the number of parameters and the calculated amount by downsampling the feature map. While retaining important features. And selecting the maximum value of a certain area in the feature map as the feature after downsampling by using maximum pooling (max pooling).
Full tie layer: after passing through a series of convolution and pooling layers, the features extracted using the fully connected layers are classified or regressed. The fully connected layer flattens the features into vectors and generates the final result by matrix multiplication and activation functions.
Output layer: for classification tasks, a probability distribution for each class is generated using a softmax function.
The decoder mainly comprises a full-time convolution layer and a convolution layer, wherein the full-time convolution layer carries out up-sampling on output characteristics, increases resolution, carries out classification prediction through the convolution layer, outputs the same size as an input image, and predicts semantic categories for each pixel;
s7, model training and optimization: training the deep learning model constructed in the step S6 by adopting a labeled training data set, performing feature descent optimization according to a loss function, performing iterative training and verification, and adjusting parameters and structures of the model to obtain a prediction model;
and (3) outputting: after training is completed, the predictive model may be used to predict or generate new data. For a given input, the predictive model will generate a corresponding output, such as a classification label, regression value, or other desired result.
The device for providing a fusion method of a two-dimensional image and a point cloud based on multi-dimensional feature registration comprises the following steps:
the preprocessing module is used for acquiring a two-dimensional image of the object and corresponding three-dimensional point cloud data and respectively preprocessing the two-dimensional image and the corresponding three-dimensional point cloud data;
the registration projection module is used for converting the preprocessed two-dimensional image and the three-dimensional point cloud data into the same coordinate system and establishing spatial position association between the two-dimensional image and the three-dimensional point cloud; and the three-dimensional point cloud data after registration are projected into a corresponding two-dimensional image space;
the feature extraction module is used for respectively carrying out feature extraction on the two-dimensional image and the three-dimensional point cloud data with the mutual spatial position association relation by adopting a convolutional neural network to obtain corresponding feature representation;
the feature fusion prediction module is used for constructing a deep learning model, the deep learning model comprises an encoder and a decoder, the encoder mainly comprises a convolution layer and a pooling layer, the convolution layer comprises a plurality of convolution kernels with different scales, each convolution kernel carries out convolution operation with input data, spatial structure information is reserved, local features are extracted at the same time, the convolution kernels are weights in the convolution operation, the convolution operation carries out sliding window calculation on the input data, a specific feature map is generated, a nonlinear activation function is adopted to activate the feature map, and nonlinear features are introduced; the pooling layer downsamples the feature map;
after being processed by a convolution layer and a pooling layer for many times, the input shared characteristic representation and the weight are subjected to matrix multiplication and nonlinear change by adopting a full-connection layer to generate output characteristics;
the decoder mainly comprises a full-time convolution layer and a convolution layer, wherein the full-time convolution layer carries out up-sampling on output characteristics, increases resolution, carries out classification prediction through the convolution layer, outputs the same size as an input image, and predicts semantic categories for each pixel;
the model training and optimizing method comprises the steps of (1) training a deep learning model constructed in the step (S6) by adopting a labeled training data set, performing feature descent optimization according to a loss function, and obtaining a prediction model after iterative training and verification;
the output module is used for inputting the two-dimensional image of the object to be processed and the corresponding three-dimensional point cloud data into the prediction model in the step S7 after the two-dimensional image of the object to be processed and the corresponding three-dimensional point cloud data are processed in the steps S1 to S5, and obtaining fusion information of the object to be processed.
< application example 1>
The method is applied to the actual situation of the transformer substation:
step one, data acquisition: and acquiring two-dimensional images and three-dimensional point cloud data of the transformer substation through equipment such as an unmanned aerial vehicle, a camera or a laser scanner. The two-dimensional image may provide color and texture information, while the three-dimensional point cloud may provide geometry and spatial coordinate information.
Step two, data registration: registering the two-dimensional image and the three-dimensional point cloud data ensures that the two-dimensional image and the three-dimensional point cloud data are in the same coordinate system. The registration can be realized by using methods such as feature point matching, automatic calibration or manual calibration. The aim of the registration is to correspond the three-dimensional point cloud of the two-dimensional image to the same spatial position, thus establishing an association between the two.
Step three, point cloud projection: and projecting the registered three-dimensional point cloud data into a corresponding two-dimensional image space. This can be achieved by mapping the three-dimensional coordinates of each point to a corresponding pixel location. Projection may be accomplished using geometric transformations and algorithms such as camera models.
Step four, extracting features: feature information is extracted from the registration projection data. For two-dimensional images, computer vision techniques such as edge detection, feature descriptor extraction, etc. may be used to obtain shape and texture information for the object. For three-dimensional point clouds, geometric features such as surface normals, curvatures, etc. can be extracted, or feature extraction based on shape descriptors can be performed.
Step four, data fusion: fusing the feature representation by adopting a convolution kernel full-connection layer to obtain a shared feature representation;
step five, inputting the shared characteristic representation in the step four into a prediction model to obtain fusion information of the object to be processed;
step five, visualization and application: and visualizing the fused two-dimensional image and three-dimensional point cloud data, and applying the two-dimensional image and the three-dimensional point cloud data to specific tasks and applications. Visualization may be achieved by rendering techniques such as point cloud rendering, texture mapping, and the like. The fused data can be used in the fields of modeling, safety analysis, maintenance planning and the like of the transformer substation.
Although embodiments of the present invention have been disclosed above, it is not limited to the details and embodiments shown and described, it is well suited to various fields of use for which the invention would be readily apparent to those skilled in the art, and accordingly, the invention is not limited to the specific details and illustrations shown and described herein, without departing from the general concepts defined in the claims and their equivalents.

Claims (10)

1. The fusion method of the two-dimensional image and the point cloud based on the multi-dimensional feature registration is characterized by comprising the following steps of:
s1, acquiring a two-dimensional image of an object and corresponding three-dimensional point cloud data, and respectively preprocessing;
s2, registering: converting the preprocessed two-dimensional image and the three-dimensional point cloud data into the same coordinate system, and establishing spatial position correlation between the two-dimensional image and the three-dimensional point cloud;
s3, point cloud projection: projecting the registered three-dimensional point cloud data into a corresponding two-dimensional image space;
s4, feature extraction: respectively extracting features of the two-dimensional image and the three-dimensional point cloud data with the mutual spatial position association relation by adopting a convolutional neural network to obtain corresponding feature representation;
s5, fusing the feature representation in the step S4 by adopting a convolution kernel full-connection layer to obtain a shared feature representation;
s6, constructing a deep learning model: the deep learning model comprises an encoder and a decoder, wherein the encoder mainly comprises a convolution layer and a pooling layer, the convolution layer comprises a plurality of convolution kernels with different scales, each convolution kernel carries out convolution operation with input data, space structure information is reserved and local features are extracted at the same time, the convolution kernels are weights in the convolution operation, the convolution operation carries out sliding window calculation on the input data, a specific feature map is generated, a nonlinear activation function is adopted to activate the feature map, and nonlinear features are introduced; the pooling layer downsamples the feature map;
after being processed by a convolution layer and a pooling layer for many times, the input shared characteristic representation and the weight are subjected to matrix multiplication and nonlinear change by adopting a full-connection layer to generate output characteristics;
the decoder mainly comprises a full-time convolution layer and a convolution layer, wherein the full-time convolution layer carries out up-sampling on output characteristics, increases resolution, carries out classification prediction through the convolution layer, outputs the same size as an input image, and predicts semantic categories for each pixel;
s7, model training and optimization: training the deep learning model constructed in the step S6 by adopting a labeled training data set, performing feature descent optimization according to a loss function, and obtaining a prediction model after iterative training and verification;
s8, processing the two-dimensional image of the object to be processed and the corresponding three-dimensional point cloud data through the steps S1-S5, and inputting the processed two-dimensional image and the processed three-dimensional point cloud data into the prediction model in the step S7 to obtain fusion information of the object to be processed.
2. The method for fusing a two-dimensional image and a point cloud based on multi-dimensional feature registration as claimed in claim 1, wherein the preprocessing in step S1 specifically comprises: preprocessing a two-dimensional image, including scaling to a target size, normalizing the pixel range of the two-dimensional image to 0-1, and cutting the two-dimensional image to obtain an interested region;
denoising the three-dimensional point cloud, selecting a filter type, setting filter parameters for filtering, selecting a resampling method, setting resampling parameters for processing the filtered three-dimensional point cloud data.
3. The method of merging a two-dimensional image and a point cloud based on multi-dimensional feature registration as recited in claim 2, wherein differences between pixels of the image are processed using an interpolation algorithm during scaling. .
4. A method of merging a two-dimensional image with a point cloud based on multi-dimensional feature registration as claimed in claim 2, wherein the normalization method is to divide the pixel value by 255 such that the pixel range is mapped from 0 to 255 to between 0 and 1.
5. A method of fusion of a two-dimensional image and a point cloud based on multi-dimensional feature registration as claimed in claim 2, characterized in that clipping criteria are defined in terms of the position, pixel values or bounding box of the image.
6. The method of merging a two-dimensional image with a point cloud based on multi-dimensional feature registration of claim 2, wherein the filter type comprises one of a mean filter, a median filter, and a gaussian filter.
7. The method of merging a two-dimensional image with a point cloud based on multi-dimensional feature registration of claim 2, wherein the resampling method comprises one of voxel gridding, nearest neighbor sampling, and surface-based sampling.
8. The method for merging the two-dimensional image and the point cloud based on multi-dimensional feature registration according to claim 1, wherein a pooling layer adopts a maximum pooling method, and a maximum value of a certain area in the feature map is selected as a feature after downsampling.
9. The method of merging a two-dimensional image with a point cloud based on multi-dimensional feature registration as recited in claim 1, wherein the activation function is a softmax function, generating a probability distribution for each category.
10. The device based on the fusion method of the two-dimensional image and the point cloud based on the multi-dimensional feature registration according to any one of claims 1 to 9, characterized by comprising:
the preprocessing module is used for acquiring a two-dimensional image of the object and corresponding three-dimensional point cloud data and respectively preprocessing the two-dimensional image and the corresponding three-dimensional point cloud data;
the registration projection module is used for converting the preprocessed two-dimensional image and the three-dimensional point cloud data into the same coordinate system and establishing spatial position association between the two-dimensional image and the three-dimensional point cloud; and the three-dimensional point cloud data after registration are projected into a corresponding two-dimensional image space;
the feature extraction module is used for respectively carrying out feature extraction on the two-dimensional image and the three-dimensional point cloud data with the mutual spatial position association relation by adopting a convolutional neural network to obtain corresponding feature representation;
the feature fusion prediction module is used for constructing a deep learning model, the deep learning model comprises an encoder and a decoder, the encoder mainly comprises a convolution layer and a pooling layer, the convolution layer comprises a plurality of convolution kernels with different scales, each convolution kernel carries out convolution operation with input data, spatial structure information is reserved, local features are extracted at the same time, the convolution kernels are weights in the convolution operation, the convolution operation carries out sliding window calculation on the input data, a specific feature map is generated, a nonlinear activation function is adopted to activate the feature map, and nonlinear features are introduced; the pooling layer downsamples the feature map;
after being processed by a convolution layer and a pooling layer for many times, the input shared characteristic representation and the weight are subjected to matrix multiplication and nonlinear change by adopting a full-connection layer to generate output characteristics;
the decoder mainly comprises a full-time convolution layer and a convolution layer, wherein the full-time convolution layer carries out up-sampling on output characteristics, increases resolution, carries out classification prediction through the convolution layer, outputs the same size as an input image, and predicts semantic categories for each pixel;
the model training and optimizing method comprises the steps of (1) training a deep learning model constructed in the step (S6) by adopting a labeled training data set, performing feature descent optimization according to a loss function, and obtaining a prediction model after iterative training and verification;
the output module is used for inputting the two-dimensional image of the object to be processed and the corresponding three-dimensional point cloud data into the prediction model in the step S7 after the two-dimensional image of the object to be processed and the corresponding three-dimensional point cloud data are processed in the steps S1 to S5, and obtaining fusion information of the object to be processed.
CN202311103838.9A 2023-08-30 2023-08-30 Fusion method and device of two-dimensional image and point cloud based on multi-dimensional feature registration Pending CN117274756A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311103838.9A CN117274756A (en) 2023-08-30 2023-08-30 Fusion method and device of two-dimensional image and point cloud based on multi-dimensional feature registration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311103838.9A CN117274756A (en) 2023-08-30 2023-08-30 Fusion method and device of two-dimensional image and point cloud based on multi-dimensional feature registration

Publications (1)

Publication Number Publication Date
CN117274756A true CN117274756A (en) 2023-12-22

Family

ID=89203457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311103838.9A Pending CN117274756A (en) 2023-08-30 2023-08-30 Fusion method and device of two-dimensional image and point cloud based on multi-dimensional feature registration

Country Status (1)

Country Link
CN (1) CN117274756A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117475357A (en) * 2023-12-27 2024-01-30 北京智汇云舟科技有限公司 Monitoring video image shielding detection method and system based on deep learning
CN117523548A (en) * 2024-01-04 2024-02-06 青岛臻图信息技术有限公司 Three-dimensional model object extraction and recognition method based on neural network
CN117649494A (en) * 2024-01-29 2024-03-05 南京信息工程大学 Reconstruction method and system of three-dimensional tongue body based on point cloud pixel matching
CN117806336A (en) * 2023-12-26 2024-04-02 珠海翔翼航空技术有限公司 Automatic berthing method, system and equipment for airplane based on two-dimensional and three-dimensional identification

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117806336A (en) * 2023-12-26 2024-04-02 珠海翔翼航空技术有限公司 Automatic berthing method, system and equipment for airplane based on two-dimensional and three-dimensional identification
CN117475357A (en) * 2023-12-27 2024-01-30 北京智汇云舟科技有限公司 Monitoring video image shielding detection method and system based on deep learning
CN117475357B (en) * 2023-12-27 2024-03-26 北京智汇云舟科技有限公司 Monitoring video image shielding detection method and system based on deep learning
CN117523548A (en) * 2024-01-04 2024-02-06 青岛臻图信息技术有限公司 Three-dimensional model object extraction and recognition method based on neural network
CN117523548B (en) * 2024-01-04 2024-03-26 青岛臻图信息技术有限公司 Three-dimensional model object extraction and recognition method based on neural network
CN117649494A (en) * 2024-01-29 2024-03-05 南京信息工程大学 Reconstruction method and system of three-dimensional tongue body based on point cloud pixel matching
CN117649494B (en) * 2024-01-29 2024-04-19 南京信息工程大学 Reconstruction method and system of three-dimensional tongue body based on point cloud pixel matching

Similar Documents

Publication Publication Date Title
CN109655019B (en) Cargo volume measurement method based on deep learning and three-dimensional reconstruction
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
Kim et al. Fully automated registration of 3D data to a 3D CAD model for project progress monitoring
CN111563415B (en) Binocular vision-based three-dimensional target detection system and method
CN109615611B (en) Inspection image-based insulator self-explosion defect detection method
CN117274756A (en) Fusion method and device of two-dimensional image and point cloud based on multi-dimensional feature registration
Brilakis et al. Toward automated generation of parametric BIMs based on hybrid video and laser scanning data
CN111027547A (en) Automatic detection method for multi-scale polymorphic target in two-dimensional image
Xu et al. Reconstruction of scaffolds from a photogrammetric point cloud of construction sites using a novel 3D local feature descriptor
EP0526881B1 (en) Three-dimensional model processing method, and apparatus therefor
CN109598794B (en) Construction method of three-dimensional GIS dynamic model
CN112529015A (en) Three-dimensional point cloud processing method, device and equipment based on geometric unwrapping
Biasutti et al. Lu-net: An efficient network for 3d lidar point cloud semantic segmentation based on end-to-end-learned 3d features and u-net
CN111462120A (en) Defect detection method, device, medium and equipment based on semantic segmentation model
CN111985466A (en) Container dangerous goods mark identification method
CN114219855A (en) Point cloud normal vector estimation method and device, computer equipment and storage medium
CN116778288A (en) Multi-mode fusion target detection system and method
CN113538373A (en) Construction progress automatic detection method based on three-dimensional point cloud
CN114332473A (en) Object detection method, object detection device, computer equipment, storage medium and program product
CN114332796A (en) Multi-sensor fusion voxel characteristic map generation method and system
Yin et al. [Retracted] Virtual Reconstruction Method of Regional 3D Image Based on Visual Transmission Effect
CN114332211B (en) Part pose calculation method based on edge reconstruction and dense fusion network
CN116246119A (en) 3D target detection method, electronic device and storage medium
CN115564915A (en) Map construction method and device for environment digital area of transformer substation
Amirkolaee et al. Convolutional neural network architecture for digital surface model estimation from single remote sensing image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination