CN117593618B - Point cloud generation method based on nerve radiation field and depth map - Google Patents

Point cloud generation method based on nerve radiation field and depth map Download PDF

Info

Publication number
CN117593618B
CN117593618B CN202410069446.3A CN202410069446A CN117593618B CN 117593618 B CN117593618 B CN 117593618B CN 202410069446 A CN202410069446 A CN 202410069446A CN 117593618 B CN117593618 B CN 117593618B
Authority
CN
China
Prior art keywords
point cloud
segmentation
image
map
neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410069446.3A
Other languages
Chinese (zh)
Other versions
CN117593618A (en
Inventor
杨苏
尹帆
李骏
周方明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Lichuang Zhiheng Electronic Technology Co ltd
Original Assignee
Suzhou Lichuang Zhiheng Electronic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Lichuang Zhiheng Electronic Technology Co ltd filed Critical Suzhou Lichuang Zhiheng Electronic Technology Co ltd
Priority to CN202410069446.3A priority Critical patent/CN117593618B/en
Publication of CN117593618A publication Critical patent/CN117593618A/en
Application granted granted Critical
Publication of CN117593618B publication Critical patent/CN117593618B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to the technical field of computer vision, and provides a point cloud generation method based on a nerve radiation field and a depth map, which is implemented by acquiring an RGB image and a depth image of a target rigid body, wherein the RGB image and the depth image are extracted from the same image generated by the target rigid body; generating a depth map point cloud by using the depth image; extracting the features of the RGB image, and executing segmentation on the features to obtain a segmentation map; training a neural radiation field model by using the segmentation map; obtaining color information and density information of sampling points through a nerve radiation field model, and generating a nerve point cloud; and performing point cloud fusion on the depth map point cloud and the nerve point cloud to obtain fusion point cloud. According to the method, the target segmentation model is utilized to segment the RGB image of the target rigid body, the neural radiation field is trained to obtain the neural point cloud through the color information and the density information of the sampling points, more non-rigid body point clouds can be reduced, and the problem that the neural radiation field cannot accurately render the rigid body is solved.

Description

Point cloud generation method based on nerve radiation field and depth map
Technical Field
The application relates to the technical field of computer vision, in particular to a point cloud generation method based on a nerve radiation field and a depth map.
Background
A point cloud is a method for representing a three-dimensional scene that can be extracted, reconstructed, or synthesized from a variety of data sources. The data sources comprise 3D scanning, a camera, a laser radar, a depth sensor and the like, and meanwhile, the point cloud can be calculated and generated by utilizing technologies such as images, visual geometry, deep learning and the like.
The point cloud generation is not only used for restoring the three-dimensional structure of the real scene, but also applied to a plurality of fields such as virtual environment creation, physical phenomenon simulation, remote sensing data analysis and the like. According to the above, the point cloud acquisition method relies on data sources such as multi-view images, laser scanning or structured light, and requires complex processing and reconstruction. These acquisition methods have high requirements on data and data acquisition equipment, while at the same time, there are difficulties in handling dynamic scenes or large-scale environments.
Neural radiation fields are a three-dimensional scene rendering technique in the fields of computer graphics and deep learning. Through the nerve radiation field, the image under any view angle can be generated under the condition that a large number of view angle images are not required to be acquired in advance for reconstruction, so that the complexity of the point cloud reconstruction method is overcome. The neural radiation field builds a continuous radiation field by training the neural network to learn the color and density information of each point in the scene. However, the neural radiation field is directly used for generating the specific rigid body point cloud and other point clouds in the scene are doped, so that the neural radiation field cannot accurately render the specific rigid body.
Disclosure of Invention
The application provides a point cloud generation method based on a nerve radiation field and a depth map, which aims to solve the problem that the nerve radiation field cannot accurately render a specific rigid body.
The application provides a point cloud generation method based on a nerve radiation field and a depth map, which comprises the following steps:
acquiring an RGB image and a depth image of a target rigid body, wherein the RGB image and the depth image are extracted from the same image which is generated by the target rigid body and contains color information and depth information;
generating a depth map point cloud by utilizing the depth image;
extracting features of the RGB image, and executing segmentation on the features to obtain a segmentation map;
training a neural radiation field model by using the segmentation map;
obtaining color information and density information of sampling points through the nerve radiation field model, and generating a nerve point cloud;
and performing point cloud fusion on the depth map point cloud and the nerve point cloud to obtain a fusion point cloud.
In some possible embodiments, the extracting the features of the RGB image and performing segmentation on the features to obtain a segmentation map includes:
training a target segmentation model;
performing feature extraction on a first RGB image by using the target segmentation model, and performing feature learning by using an attention mechanism module of the segmentation model to obtain a segmentation result, wherein the RGB image at least comprises a first RGB image and a second RGB image, the first RGB image is an RGB image input for the first time, and the second RGB image is an RGB image input for the second time;
extracting the segmentation result by utilizing interactive segmentation to obtain a first feature vector;
and according to the first feature vector, performing feature extraction on the second RGB image by using a matching algorithm to obtain a second feature vector, wherein the first feature vector has similarity with the second feature vector.
In some possible embodiments, the extracting the segmentation result by using interactive segmentation to obtain a first feature vector includes:
extracting the segmentation result by utilizing interactive segmentation to obtain a mask image;
performing encoding on the first RGB image to obtain a first feature map;
adjusting the size of the mask image to be equal to the size of the first RGB image to obtain an adjusted image;
and carrying out average operation on the adjusted image and the first feature map to obtain a first feature vector.
In some possible embodiments, the extracting the features of the RGB image and performing segmentation on the features to obtain a segmentation map includes:
performing encoding on the second RGB image to obtain a second feature map;
inputting the second feature map into the ROI alignment, and outputting a third feature map, wherein the third feature map at least comprises a group of feature vectors, and a group of feature vectors map a plurality of candidate areas;
performing an average operation on the third feature map to obtain an average feature vector;
calculating cosine distances between the first feature vector and the average feature vector;
acquiring the candidate region according to the cosine distance of the minimum value;
and matching a candidate category according to the candidate region, wherein the candidate region and the candidate category have a mapping relation.
In some possible embodiments, the extracting the features of the RGB image and performing segmentation on the features to obtain a segmentation map includes:
outputting the optimized candidate region by using a target detection algorithm according to the second feature map;
generating an initial thermodynamic diagram according to the optimized candidate region;
inputting the first feature vector, the third feature map and the initial thermodynamic diagram into a neural network model to output point class prompts;
and generating a segmentation graph according to the point class prompt.
In some possible embodiments, the obtaining, by the neural radiation field model, color information and density information of the sampling points, and generating a neural point cloud includes:
acquiring sampling points, wherein the sampling points are distributed in an implicit space of the nerve radiation field model;
and inputting the coordinates of the sampling points into the nerve radiation field model, and outputting the color information and the density information of the sampling points according to the segmentation map.
In some possible embodiments, the obtaining, by the neural radiation field model, color information and density information of the sampling points, and generating a neural point cloud includes:
setting a density threshold;
traversing the sampling points, and removing the sampling points with the density smaller than the density threshold value to generate a nerve point cloud.
In some possible embodiments, the performing point cloud fusion on the depth map point cloud and the neural point cloud to obtain a fused point cloud includes:
acquiring color information of the nerve point cloud to serve as color information of the fusion point cloud;
calculating the position information of the fusion point cloud by utilizing the position information of the neural point cloud and the position information of the depth map point cloud;
and obtaining the fusion point cloud through the color information and the position information of the fusion point cloud.
In some possible embodiments, the performing point cloud fusion on the depth map point cloud and the neural point cloud to obtain a fused point cloud includes:
according to the color information of the fusion point cloud, colorless points in the fusion point cloud are obtained, wherein the colorless points are points without color information;
setting a distance range;
searching for a nearby point in the distance range, wherein the nearby point is a point with color information;
acquiring color information according to the adjacent points;
and adding color for the colorless point according to the color information.
In some possible embodiments, the generating a depth map point cloud using the depth image includes:
converting pixel values in the depth image into three-dimensional point coordinates, wherein the three-dimensional point coordinates are a camera coordinate system;
converting the camera coordinate system into a world coordinate system to obtain a three-dimensional coordinate;
generating a point cloud data set, wherein the point cloud data set comprises three-dimensional coordinates of pixel points;
and acquiring a depth map point cloud through the point cloud data set.
According to the technical scheme, the application provides a point cloud generation method based on a nerve radiation field and a depth map, which comprises the steps of obtaining an RGB image and a depth image of a target rigid body, wherein the RGB image and the depth image are extracted from the same image which is generated by the target rigid body and contains color information and depth information; generating a depth map point cloud by using the depth image; extracting the features of the RGB image, and executing segmentation on the features to obtain a segmentation map; training a neural radiation field model by using the segmentation map; obtaining color information and density information of sampling points through a nerve radiation field model, and generating a nerve point cloud; and performing point cloud fusion on the depth map point cloud and the nerve point cloud to obtain a fusion point cloud. According to the method, the RGB image of the target rigid body is segmented by using the target segmentation model, the neural radiation field is trained to obtain the color information and the density information of the sampling points, so that the neural point cloud is obtained, and the fusion point cloud is obtained through fusion of the neural point cloud and the depth map point cloud, so that more non-rigid point clouds can be avoided, and the problem that the neural radiation field cannot accurately render a specific rigid body is solved.
Drawings
In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
Fig. 1 is a schematic flow chart of a point cloud generating method based on a neural radiation field and a depth map according to the embodiment;
FIG. 2 is a schematic diagram of a segmentation process performed on a feature according to the present embodiment;
fig. 3 is a schematic diagram of a segmentation map acquisition flow shown in the present embodiment;
FIG. 4 is a schematic diagram of a segmentation map acquisition flow diagram according to another embodiment;
fig. 5 is a schematic flow chart of generating a neural point cloud according to the present embodiment.
Detailed Description
Reference will now be made in detail to the embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The embodiments described in the examples below do not represent all embodiments consistent with the present application. Merely as examples of systems and methods consistent with some aspects of the present application as detailed in the claims.
The rigid body is an object in which the distance between any two points in the rigid body is always unchanged in motion. Neural radiation fields are a technique for rendering three-dimensional scenes using neural networks to describe the ray propagation and color of the scene. The neural radiation field can learn 3D representation of the scene from the multi-angle 2D image and generate a high-quality rendering result, but the neural radiation field globally models the scene, all objects and backgrounds in the scene are considered at the same time, and when the point cloud of a specific rigid body is tried to be extracted, the point cloud of other objects or backgrounds in the scene can be included; the nerve radiation field does not have definite object segmentation capability, and can generate a high-quality rendering image, but does not distinguish different objects in a scene, so that the nerve radiation field can mix the point clouds of different objects when generating the point clouds; all resulting in the inability of the neural radiation field to accurately render a particular rigid body.
In order to solve the problem that a nerve radiation field cannot accurately render a specific rigid body, referring to fig. 1, an embodiment of the present application provides a point cloud generating method based on the nerve radiation field and a depth map, including:
s100: an RGB (Red Green Blue three color light) image and a depth image of the target rigid body are acquired.
The RGB image and the depth image are extracted from the same image which is generated by the target rigid body and contains the color information and the depth information, the target rigid body is shot by a RGBD (RGB Depth) camera or a depth camera, the RGBD camera or the depth camera shoots an image which contains the color information and the depth information, when the RGB image and the depth image are extracted, an RGB channel and a depth channel are separated, the RGB channel which is obtained after separation contains the color information, the depth channel contains the distance information of the target rigid body, the RGB image is extracted from the separated channel, and the depth image is extracted from the separated depth channel, wherein each pixel value in the depth image represents the distance from the target rigid body to the camera.
In this embodiment, depth information may be extracted using OpenCV (Open Computer Vision, a cross-platform computer vision and machine learning software library), where OpenCV provides an interface for reading RGBD images to facilitate reading RGB and depth channel data from RGBD images, two or more cameras may be used to obtain multiple views of a scene, and to calculate depth information, and when extracting depth information, post-processing may also be performed, such as: filtering, denoising, etc., to improve the quality of the depth image.
The depth image may be obtained by a depth sensor, binocular camera, structured light scanner or other three-dimensional sensor, by which a depth map dataset is acquired in this embodiment.
S200: a depth map point cloud is generated using the depth image.
The pixel values in each depth image in the depth map dataset are converted to coordinates of three-dimensional points by mapping the depth values to three-dimensional coordinates in the camera coordinate system, in this embodiment using internal parameters of the camera such as: focal length, camera center, and pixel coordinates, but the pixel points in the depth image are represented in the camera coordinate system.
To obtain the true three-dimensional coordinates in the scene, these pixels need to be transformed from the camera coordinate system into the world coordinate system, in some embodiments by transforming the pixel values in the depth image into three-dimensional point coordinates; converting the camera coordinate system into a world coordinate system to obtain a three-dimensional coordinate; generating a point cloud data set; and acquiring a depth map point cloud through the point cloud data set.
The three-dimensional point coordinates are a camera coordinate system, and the point cloud data set comprises three-dimensional coordinates of pixel points. In the process of generating the point cloud data set, mapping each pixel point of the depth image into three-dimensional coordinates (X, Y and Z), and combining the three-dimensional coordinates into the point cloud data set, wherein the point cloud data set comprises a plurality of three-dimensional coordinate points in a scene, and each point corresponds to one pixel point in the depth image.
And (3) corresponding the three-dimensional coordinate points in the point cloud data set to the pixel points of the depth image to obtain the depth map point cloud. It can be understood that each point in the point cloud data set has a pixel point in the corresponding depth image, but not all pixels in the depth image have corresponding point cloud data, so that in the process of acquiring the point cloud of the depth image, screening can be performed.
S300: features of the RGB image are extracted, and segmentation is performed on the features to obtain a segmentation map.
Referring to fig. 2, the specific steps of feature extraction of an RGB image and performing segmentation on the features are as follows:
s301: and training a target segmentation model.
In this embodiment, for a specific rigid body, a flat and flawless point cloud result is rendered. Therefore, the target segmentation model is a deep learning model, a training data set is required to be acquired before training the deep learning model, preprocessing is performed on the training data set, and a required rigid body contour plane graph is selected by using an image segmentation method. Therefore, a target segmentation network with better generalization performance is selected, and a model can obtain a relatively accurate segmentation result when most rigid bodies are processed. In this embodiment, a target segmentation model trained on a large number of pictures is selected, and a segmentation model based on a transducer is used as the target segmentation model.
S302: and performing feature extraction on the first RGB image by using the target segmentation model, and performing feature learning by using an attention mechanism module of the segmentation model to obtain a segmentation result.
In this embodiment, the RGB image enters the target segmentation network, and after the segmentation result is obtained, the subsequent processing is performed. Therefore, the processing method differs between when the first RGB image is processed and when the subsequent picture is processed.
When a first RGB image is processed, a segmentation area is needed to be manually selected, segmentation image features are obtained through an interactive segmentation method, and when a subsequent image is processed, an image matching method is used for automatically selecting the area corresponding to the first segmentation image in the subsequent image. Thereby realizing the effect of automatic segmentation.
The RGB images at least include a first RGB image and a second RGB image, the first RGB image is a first input RGB image, it is to be understood that in this embodiment, the rigid bodies are the same as the first input standard, and if the rigid bodies of the two RGB images are the same but the backgrounds are different, the first input RGB image is the first RGB image, and the second input RGB image is the second RGB image; again, by way of example, if the rigid bodies of the two RGB images are different but the backgrounds are the same, the first input RGB image is the first RGB image and the second output RGB image is the first RGB image. The second RGB image is not the first RGB image, and it is known from the above that the divided mask is a rigid body, and whether the inputted image is the first RGB image or the second RGB image is determined based on the rigid body.
Feature extraction is performed using a transform-based segmentation model, specifically, using a transform encoder-decoder architecture, inputting a first RGB image to a transform-based segmentation model encoder, and feature extraction is performed on the image by multiple transform encoder layers, wherein the transform encoder performs self-attention computation and feed-forward neural network operations on the input features to extract higher level feature representations.
After the features are extracted, an attention mechanism module is applied to the extracted features, the attention mechanism module can automatically learn and adjust the weight of each feature according to the importance of different features, the segmentation performance of the model can be improved, and through the attention mechanism module, the model can learn which features are more important for target segmentation, so that the weight of each feature is adjusted.
S303: and extracting a segmentation result by using the interactive segmentation to obtain a first feature vector.
The interactive segmentation is to segment a target area in an image by means of interaction between a user and a computer. In the interactive segmentation process, a user can mark an interested region on an image through a mouse or a touch screen and other devices, and a computer automatically extracts a target region from the image according to the marking and segmentation algorithm of the user and generates a mask image.
In some embodiments, extracting the segmentation result using the interactive segmentation to obtain the first feature vector includes:
extracting a segmentation result by utilizing interactive segmentation to obtain a mask image;
performing encoding on the first RGB image to obtain a first feature map;
adjusting the size of the mask image to be equal to the size of the first RGB image to obtain an adjusted image;
and carrying out average operation on the adjusted image and the first feature map to obtain a first feature vector.
And obtaining a mask image through interactive segmentation, wherein the mask image is a binary image, and the target area is marked as 1, and the non-target area is marked as 0. The mask image may be used to indicate a region or boundary of interest. With continued reference to fig. 3, the rigid body shown in fig. 3 is a pear, and, taking the pear as an example, the image obtained by interactive segmentation in fig. 3 is a mask image, which is a frame-selected image of the edge of the pear.
Referring to fig. 4, after the first RGB image is input into a transform-based segmentation model, an encoder of the segmentation model performs encoding on the first RGB image to obtain a first feature image feature1, wherein feature1 (w, h, 1024), w represents a width or height of a feature, h represents a height or depth of the feature, and 1024 represents a depth or a size of a third dimension of the feature.
By the adjustment operation (restore), the size of the mask image is adjusted to be equal to the size of the first RGB image using the size of the first RGB image as a standard, and then the adjusted image is (w, h, 1024).
After the averaging operation, the generated first feature vector is a one-dimensional tensor, the dimension of which is (1,1,1024), and the first feature vector represents average features extracted from the adjusted image and the first feature map, and each channel represents an average value of the corresponding features.
For example, referring again to fig. 3, the image of the pear placed on the table in fig. 3 is a first RGB image, and a mask image, that is, an image of the edge frame of the pear is generated through interactive segmentation, and then a first feature vector, that is, an image of the pear itself segmented is obtained through adjustment and averaging operations.
S304: and performing feature extraction on the second RGB image by using a matching algorithm according to the first feature vector to obtain a second feature vector.
The matching algorithm identifies similar or matching regions by comparing features of the two images, and in the process of executing the matching algorithm, feature extraction may be performed on the second RGB image using the first feature vector as a reference feature to obtain a second feature vector.
In some embodiments, extracting features of an RGB image and performing segmentation on the features to obtain a segmentation map includes:
performing encoding on the second RGB image to obtain a second feature map;
inputting the second feature map into an ROI alignment (Region of Interest Align region of interest alignment algorithm) and outputting a third feature map, wherein the third feature map at least comprises a group of feature vectors, and the group of feature vectors map a plurality of candidate regions;
performing an averaging operation on the third feature map to obtain an average feature vector;
calculating cosine distances of the first feature vector and the average feature vector;
acquiring a candidate region according to the cosine distance of the minimum value;
and matching the candidate category according to the candidate region, wherein the candidate region and the candidate category have a mapping relation.
With continued reference to fig. 4, as in the first feature map, after the second RGB image is input into the transform-based segmentation model, the segmentation model encoder encodes the second RGB image to obtain a second feature map feature2, where feature2 (w, h, 1024), w represents the width or height of the feature, h represents the height or depth of the feature, and 1024 represents the depth or size of the third dimension of the feature.
The ROI alignment is an algorithm for detecting and segmenting the target, can extract the characteristics of the target region, and improves the accuracy of detection and segmentation. In the ROI alignment algorithm, the second feature map is divided into a plurality of candidate areas, each candidate area corresponds to a feature vector, the feature vectors are aligned and fused through the ROI alignment algorithm, a new feature map is generated, namely a third feature map, and the third feature map contains feature information of a target area and can be used for target detection and segmentation.
The size of the candidate region is 7×7, the second feature map is divided into a plurality of candidate regions, and for each candidate region, the ROI alignment algorithm may extract feature vectors by an averaging operation, all feature vectors form a third feature map, and the dimension of the third feature map is (n, 7,7,1024), where n is the number of candidate regions.
After the third feature map is subjected to the averaging operation, a global feature vector is obtained by averaging feature values of all pixel points in the third feature map, and the global feature vector can represent the overall features of the whole image or the target area. When the averaging operation is performed, the feature vector of each candidate region (7×7) in the third feature map is averaged, resulting in a smaller feature vector, i.e., an average feature vector. The dimension of this average feature vector is (n, 1,1,1024), where 1×1 is the size of each candidate region.
And finding out a candidate region corresponding to the minimum cosine distance according to the cosine distance obtained by calculating the first feature vector and the average feature vector, wherein the candidate region is the region most similar to the first feature vector, and the category of the region can be further matched according to the obtained candidate region. By matching candidate categories, the position and category of the target can be determined more accurately, so that more accurate information is provided for subsequent segmentation tasks.
In some embodiments, extracting features of the RGB image and performing segmentation on the features to obtain a segmentation map, further comprising:
outputting the optimized candidate region by using a target detection algorithm according to the second feature map;
generating an initial thermodynamic diagram according to the optimized candidate region;
inputting the first feature vector, the third feature map and the initial thermodynamic diagram into a neural network model to output point class prompts;
a segmentation map is generated from the point class cues.
In this embodiment, the target detection algorithm is an RPN algorithm, which is an algorithm for target detection, and may generate a series of candidate regions (Proposal Boxes), where the candidate regions are regions containing a target rigid body, and after the second feature map is processed by the RPN, a series of optimized candidate regions may be obtained. Illustratively, each candidate region is represented by four values, which are: x1, x1 is the upper left-hand abscissa of the candidate region, y1, y1 is the upper left-hand ordinate of the candidate region, x2, x2 is the lower right-hand abscissa of the candidate region, and y2, y2 is the lower right-hand ordinate of the candidate region. It can be seen that each candidate region can define its position and size in the original image by these four values. The RPN generates a candidate region of shape (N, 4), where N is the number of generated candidate regions that contain candidate region information.
For the optimized candidate regions, the similarity of a plurality of optimized candidate regions may be calculated, and an initial thermodynamic diagram in which regions with high similarity are represented as adjacent or similar colors may be generated according to the calculated similarity.
The neural network model is a binary segmentation model (Binary Segmentation Network) in this embodiment, the binary segmentation network is used for the neural network model for image segmentation, a convolutional neural network is used as a basic model, and the characteristic representation of the image is extracted by performing convolution and pooling operations on the image. These features are used to train a binary segmentation model that divides the image into foreground and background portions.
The binary segmentation model outputs a binary image, each pixel in the binary image represents whether the pixel belongs to a point class or a non-point class, and a point class prompt is obtained through analysis of the output binary image, wherein the point class prompt represents the pixel position of the pixel belonging to a point class region in input data. Based on the results of the point class cues, a segmentation line may be drawn on the image, and for each pixel belonging to the point class, one or more line segments connected to surrounding pixels are drawn to form a closed region, i.e. a segmentation map, which segments divide the image into different regions, each region representing a point class.
For example, referring to fig. 3, a second RGB image is input, the rigid body in the second RGB image is the same rigid body as the first RGB image, and the first feature vector is taken as a sample, and a segmentation map, that is, an image in which pears themselves are segmented, is generated through the above steps.
S400: the neural radiation field model is trained using the segmentation map.
For training the neural radiation field model, a training image, which in this embodiment is a segmentation map, may be prepared first. And initializing a neural network model, performing volume rendering on each training image by using the neural radiation field model, namely, taking 3D coordinates and a visual angle as input, predicting color and transparency through the model, rendering a scene according to the obtained predicted value, obtaining a rendering result, and comparing the rendering result with an actual image to calculate loss. And updating the weight of the neural network by using the calculated back propagation error of the loss function, wherein the weight can be realized by gradient descent or other optimization algorithms, and the steps are repeated until a preset training period is reached or the loss is not obviously reduced, so that a trained neural radiation field model is obtained. In the training process, the point cloud is not required to be used as a supervision feature of the model for training, and in the generating process, the point cloud is obtained by using output sampling of the full connection layer.
S500: and obtaining color information and density information of the sampling points through a nerve radiation field model, and generating a nerve point cloud.
Referring to fig. 5, color information and density information of sampling points are obtained through a neural radiation field model, and a neural point cloud is generated, which comprises the following specific steps:
s501: sampling points are obtained.
In acquiring the sampling points, a series of sampling points may be acquired, which are distributed within the implicit space of the neural radiation field model.
S502: and inputting coordinates of the sampling points into the nerve radiation field model, and outputting color information and density information of the sampling points.
For each sampling point, rendering is performed by using a neural radiation field model, and the 3D coordinates and the visual angles of the sampling points are taken as inputs, wherein the model outputs corresponding colors and transparency, namely, color information and density information of each sampling point.
It will be appreciated that the performance of the neural radiation field model may also be compared based on the rendered results, for example: the difference between the rendering result and the actual scene is compared, or other indexes are used for measuring the accuracy of the model. If the rendering result is found to have a large difference from the actual scene, the weight of the neural network can be updated through back propagation, the calculated loss function is used for back propagation of errors, and the weight of the neural network is updated.
S503: a density threshold is set.
By setting the density threshold, points where the obtained density information is smaller than the density threshold can be removed.
S504: traversing the sampling points to remove sampling points having a density less than a density threshold to generate a neural point cloud.
In some application scenarios, a large number of redundant points may exist in some parts, sampling points with the obtained density information smaller than the density threshold value can be removed by setting the density threshold value, and points with lower density are removed, so that the rigid body can be better highlighted. And generating the neural point cloud by using the removed density information and color information.
S600: and performing point cloud fusion on the depth map point cloud and the nerve point cloud to obtain a fusion point cloud.
Before the fusion point cloud is obtained, special points in the nerve point cloud can be processed, and in some embodiments, according to the color information of the fusion point cloud, non-color points in the fusion point cloud are obtained, wherein the non-color points are points without color information; setting a distance range; searching for a nearby point in the distance range, wherein the nearby point is a point with color information; acquiring color information according to the adjacent points; adding color to the colorless dots according to the color information.
For a point a without color information, color information can be added to it by the following formula:
wherein,color a>The color of the point adjacent to the point a,is the two norms of point a and the point close to point a, ++>Is a variable distance parameter.
After the above processing, the dots which still have no color information are removed.
The neural point cloud has color information, and the depth map point cloud has position information, and in some embodiments, the color information of the neural point cloud is acquired first to serve as the color information of the fusion point cloud; calculating the position information of the fusion point cloud by utilizing the position information of the nerve point cloud and the position information of the depth map point cloud; and obtaining the fusion point cloud through the color information and the position information of the fusion point cloud.
By storing more color information, a scene can be more completely represented, less important visual details are lost, the scene can be completed through point cloud weighting, and specifically, the color weighting coefficient of the neural point cloud is set to be 1, and in the following formula, only the color information of the neural point cloud is used; for the position information, the neural point cloud and the depth map point cloud are set to different coefficients, and the fusion definition of each point is as follows:
wherein,color information for point i +.>For the position information of the i-point +.>Color information for a nerve point cloud, +.>As the location information of the cloud of nerve points,for the position information of the depth map point cloud, +.>And->Is a variable parameter.
The i point is a fusion point cloud, the color information of the fusion point cloud is the color information of the neural point cloud, and the position information of the fusion point cloud is calculated by the position information of the neural point cloud and the position information of the depth map point cloud.
According to the method, the target rigid body segmentation graph is extracted by using the target segmentation model at the input end, wherein the target segmentation network has good generalization performance, most of target rigid bodies in the image can be segmented without fine adjustment of the model, the neural radiation field is trained again, a self-separation encoder-like framework is used, when the neural radiation field is trained, model output carries out model training through body rendering and real image as a loss function, when the rendering process is completed after training, the model output does not need to be rendered, point cloud information is directly obtained through the output of the neural network, the robustness of the network can be enhanced by training the neural radiation field network by using an easily-acquired picture, and better point cloud generation fineness is obtained. In each rendering process, the target rigid body is distributed at any position of the two-dimensional image, the color and density information of each spatial sampling point is obtained by using a neural network, and the neural point cloud is obtained by sampling. And fusing the neural point cloud and the depth map point cloud to obtain a fused point cloud.
According to the technical scheme, the application provides a point cloud generation method based on a nerve radiation field and a depth map, which comprises the steps of obtaining an RGB image and a depth image of a target rigid body, wherein the RGB image and the depth image are extracted from the same image which is generated by the target rigid body and contains color information and depth information; generating a depth map point cloud by using the depth image; extracting the features of the RGB image, and executing segmentation on the features to obtain a segmentation map; training a neural radiation field model by using the segmentation map; obtaining color information and density information of sampling points through a nerve radiation field model, and generating a nerve point cloud; and performing point cloud fusion on the depth map point cloud and the nerve point cloud to obtain a fusion point cloud. According to the method, the RGB image of the target rigid body is segmented by using the target segmentation model, the neural radiation field is trained to obtain the color information and the density information of the sampling points, so that the neural point cloud is obtained, and the fusion point cloud is obtained through fusion of the neural point cloud and the depth map point cloud, so that more non-rigid point clouds can be avoided, and the problem that the neural radiation field cannot accurately render a specific rigid body is solved.
The foregoing detailed description of the embodiments is merely illustrative of the general principles of the present application and should not be taken in any way as limiting the scope of the invention. Any other embodiments developed in accordance with the present application without inventive effort are within the scope of the present application for those skilled in the art.

Claims (9)

1. A point cloud generation method based on a neural radiation field and a depth map, comprising:
acquiring an RGB image and a depth image of a target rigid body, wherein the RGB image and the depth image are extracted from the same image which is generated by the target rigid body and contains color information and depth information;
generating a depth map point cloud by utilizing the depth image;
extracting features of the RGB image, and executing segmentation on the features to obtain a segmentation map;
training a neural radiation field model by using the segmentation map;
obtaining color information and density information of sampling points through the nerve radiation field model, and generating a nerve point cloud;
performing point cloud fusion on the depth map point cloud and the neural point cloud;
acquiring color information of the nerve point cloud to serve as color information of the fusion point cloud;
calculating the position information of the fusion point cloud by utilizing the position information of the neural point cloud and the position information of the depth map point cloud;
and obtaining the fusion point cloud through the color information and the position information of the fusion point cloud.
2. The method of generating a point cloud based on a neural radiation field and a depth map according to claim 1, wherein the extracting features of the RGB image and performing segmentation on the features to obtain a segmentation map comprises:
training a target segmentation model;
performing feature extraction on a first RGB image by using the target segmentation model, and performing feature learning by using an attention mechanism module of the segmentation model to obtain a segmentation result, wherein the RGB image at least comprises a first RGB image and a second RGB image, the first RGB image is an RGB image input for the first time, and the second RGB image is an RGB image input for the second time;
extracting the segmentation result by utilizing interactive segmentation to obtain a first feature vector;
and according to the first feature vector, performing feature extraction on the second RGB image by using a matching algorithm to obtain a second feature vector, wherein the first feature vector has similarity with the second feature vector.
3. The method of generating a point cloud based on a neural radiation field and depth map according to claim 2, wherein extracting the segmentation result using interactive segmentation to obtain a first feature vector comprises:
extracting the segmentation result by utilizing interactive segmentation to obtain a mask image;
performing encoding on the first RGB image to obtain a first feature map;
adjusting the size of the mask image to be equal to the size of the first RGB image to obtain an adjusted image;
and carrying out average operation on the adjusted image and the first feature map to obtain a first feature vector.
4. The method of generating a point cloud based on a neural radiation field and depth map according to claim 2, wherein the extracting features of the RGB image and performing segmentation on the features to obtain a segmentation map comprises:
performing encoding on the second RGB image to obtain a second feature map;
inputting the second feature map into the ROI alignment, and outputting a third feature map, wherein the third feature map at least comprises a group of feature vectors, and a group of feature vectors map a plurality of candidate areas;
performing an average operation on the third feature map to obtain an average feature vector;
calculating cosine distances between the first feature vector and the average feature vector;
acquiring the candidate region according to the cosine distance of the minimum value;
and matching a candidate category according to the candidate region, wherein the candidate region and the candidate category have a mapping relation.
5. The method of generating point cloud based on neural radiation field and depth map of claim 4, wherein said extracting features of said RGB image and performing segmentation on said features to obtain a segmentation map comprises:
outputting the optimized candidate region by using a target detection algorithm according to the second feature map;
generating an initial thermodynamic diagram according to the optimized candidate region;
inputting the first feature vector, the third feature map and the initial thermodynamic diagram into a neural network model to output point class prompts;
and generating a segmentation graph according to the point class prompt.
6. The method for generating a point cloud based on a neural radiation field and a depth map according to claim 1, wherein the obtaining color information and density information of sampling points and generating a neural point cloud by the neural radiation field model includes:
acquiring sampling points, wherein the sampling points are distributed in an implicit space of the nerve radiation field model;
and inputting the coordinates of the sampling points into the nerve radiation field model, and outputting the color information and the density information of the sampling points according to the segmentation map.
7. The method for generating a point cloud based on a neural radiation field and a depth map according to claim 6, wherein the obtaining color information and density information of sampling points and generating a neural point cloud by the neural radiation field model includes:
setting a density threshold;
traversing the sampling points, and removing the sampling points with the density smaller than the density threshold value to generate a nerve point cloud.
8. The method of generating a point cloud based on a neural radiation field and a depth map according to claim 1, wherein the performing a point cloud fusion on the depth map point cloud and the neural point cloud to obtain a fused point cloud comprises:
according to the color information of the fusion point cloud, colorless points in the fusion point cloud are obtained, wherein the colorless points are points without color information;
setting a distance range;
searching for a nearby point in the distance range, wherein the nearby point is a point with color information;
acquiring color information according to the adjacent points;
and adding color for the colorless point according to the color information.
9. The method of generating a point cloud based on a neural radiation field and a depth map according to claim 1, wherein generating a depth map point cloud using the depth image comprises:
converting pixel values in the depth image into three-dimensional point coordinates, wherein the three-dimensional point coordinates are a camera coordinate system;
converting the camera coordinate system into a world coordinate system to obtain a three-dimensional coordinate;
generating a point cloud data set, wherein the point cloud data set comprises three-dimensional coordinates of pixel points;
and acquiring a depth map point cloud through the point cloud data set.
CN202410069446.3A 2024-01-18 2024-01-18 Point cloud generation method based on nerve radiation field and depth map Active CN117593618B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410069446.3A CN117593618B (en) 2024-01-18 2024-01-18 Point cloud generation method based on nerve radiation field and depth map

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410069446.3A CN117593618B (en) 2024-01-18 2024-01-18 Point cloud generation method based on nerve radiation field and depth map

Publications (2)

Publication Number Publication Date
CN117593618A CN117593618A (en) 2024-02-23
CN117593618B true CN117593618B (en) 2024-04-05

Family

ID=89915370

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410069446.3A Active CN117593618B (en) 2024-01-18 2024-01-18 Point cloud generation method based on nerve radiation field and depth map

Country Status (1)

Country Link
CN (1) CN117593618B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111179324A (en) * 2019-12-30 2020-05-19 同济大学 Object six-degree-of-freedom pose estimation method based on color and depth information fusion
CN113706714A (en) * 2021-09-03 2021-11-26 中科计算技术创新研究院 New visual angle synthesis method based on depth image and nerve radiation field

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111179324A (en) * 2019-12-30 2020-05-19 同济大学 Object six-degree-of-freedom pose estimation method based on color and depth information fusion
CN113706714A (en) * 2021-09-03 2021-11-26 中科计算技术创新研究院 New visual angle synthesis method based on depth image and nerve radiation field

Also Published As

Publication number Publication date
CN117593618A (en) 2024-02-23

Similar Documents

Publication Publication Date Title
CN109872397B (en) Three-dimensional reconstruction method of airplane parts based on multi-view stereo vision
CN111563415B (en) Binocular vision-based three-dimensional target detection system and method
Ham et al. Computer vision based 3D reconstruction: A review
CN111523398A (en) Method and device for fusing 2D face detection and 3D face recognition
CN111899328B (en) Point cloud three-dimensional reconstruction method based on RGB data and generation countermeasure network
CN113065546B (en) Target pose estimation method and system based on attention mechanism and Hough voting
CN110381268B (en) Method, device, storage medium and electronic equipment for generating video
CN114666564B (en) Method for synthesizing virtual viewpoint image based on implicit neural scene representation
CN110910437B (en) Depth prediction method for complex indoor scene
CN113686314B (en) Monocular water surface target segmentation and monocular distance measurement method for shipborne camera
US20210374986A1 (en) Image processing to determine object thickness
Condorelli et al. A comparison between 3D reconstruction using nerf neural networks and mvs algorithms on cultural heritage images
CN114782628A (en) Indoor real-time three-dimensional reconstruction method based on depth camera
CN113160421A (en) Space type real object interaction virtual experiment method based on projection
CN116958420A (en) High-precision modeling method for three-dimensional face of digital human teacher
KR20230150867A (en) Multi-view neural person prediction using implicit discriminative renderer to capture facial expressions, body posture geometry, and clothing performance
CN116681839B (en) Live three-dimensional target reconstruction and singulation method based on improved NeRF
CN113840127A (en) Method for acquiring water area automatic mask processing DSM by satellite video image
CN116958434A (en) Multi-view three-dimensional reconstruction method, measurement method and system
CN117218192A (en) Weak texture object pose estimation method based on deep learning and synthetic data
CN116958393A (en) Incremental image rendering method and device
CN117593618B (en) Point cloud generation method based on nerve radiation field and depth map
CN116310228A (en) Surface reconstruction and new view synthesis method for remote sensing scene
KR102648882B1 (en) Method for lighting 3D map medeling data
JP2024521816A (en) Unrestricted image stabilization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant