CN113281779A - 3D object rapid detection method, device, equipment and medium - Google Patents

3D object rapid detection method, device, equipment and medium Download PDF

Info

Publication number
CN113281779A
CN113281779A CN202110553663.6A CN202110553663A CN113281779A CN 113281779 A CN113281779 A CN 113281779A CN 202110553663 A CN202110553663 A CN 202110553663A CN 113281779 A CN113281779 A CN 113281779A
Authority
CN
China
Prior art keywords
neural network
determining
binary
point cloud
binary neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110553663.6A
Other languages
Chinese (zh)
Other versions
CN113281779B (en
Inventor
陈刚
孟海涛
李昌财
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202110553663.6A priority Critical patent/CN113281779B/en
Publication of CN113281779A publication Critical patent/CN113281779A/en
Application granted granted Critical
Publication of CN113281779B publication Critical patent/CN113281779B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/88Lidar systems specially adapted for specific applications
    • G01S17/89Lidar systems specially adapted for specific applications for mapping or imaging
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/86Combinations of lidar systems with systems other than lidar, radar or sonar, e.g. with direction finders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Theoretical Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Electromagnetism (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method, a device, equipment and a medium for rapidly detecting a 3D object, wherein the method comprises the following steps: acquiring left and right views and camera calibration parameters; processing the database picture and the real distance depth map through a neural network frame, and training to obtain a binary neural network model; determining a binary neural network according to a binary neural network model by combining a binary data channel packaging technology and a network computing laminar flow technology; inputting the left view and the right view into a binary neural network, extracting the characteristics of the left view and the right view, and determining a disparity map; constructing point cloud data coordinate points by combining camera calibration parameters according to the disparity map, and determining a visual radar signal; discretizing the visual radar signal, inputting the discretized visual radar signal into a network based on the depth residual error and the feature pyramid, and determining a prediction result. The invention reduces the cost of three-dimensional object detection, improves the speed and accuracy of detection, and can be widely applied to the technical field of three-dimensional object detection.

Description

3D object rapid detection method, device, equipment and medium
Technical Field
The invention relates to the technical field of three-dimensional object detection, in particular to a method, a device, equipment and a medium for rapidly detecting a 3D object.
Background
In recent years, with the development of the automatic driving technology, the status of the laser radar has become more and more important. The laser radar has the advantages that the distance including objects can be directly measured, an automatic driving target detection algorithm can be developed, and the positions and the advancing directions of different targets can be accurately estimated in three-dimensional target detection.
Nowadays, a technology for obtaining geometric coordinates and position coordinates of surrounding objects by processing point cloud signal data generated by a laser radar is well developed. Such as Frustum PointNets, etc. It shows very high accuracy on testing of the KITTI data set. However, it also requires pre-processing of the calibrated camera image before processing the point cloud data.
The following disadvantages of such a design can be seen: the accuracy of the model depends largely on the camera image and the associated convolutional neural network; the whole process has more neural networks and more complex models, which can cause too high delay and low efficiency.
Most of the existing three-dimensional detection technologies rely on laser radar, but the hardware cost of the laser radar is high. For example, HDL-64E lidar manufactured by Velodyne corporation, USA, has a domestic selling price of over fifty ten thousand yuan. In addition, the point cloud data obtained by the laser radar is sparse, which may result in a target with a complex appearance, or some small objects cannot be reflected in the point cloud. Moreover, the laser radar can only provide sparse measurement points, cannot provide image and color information, and is difficult to perform further task development on the basis of the sparse measurement points. In addition, the existing three-dimensional detection network is slow, and has high requirements on hardware storage, calculation and energy consumption. The three-dimensional detection network has high calculation amount requirements, and is difficult to be deployed in practical application, so that the requirement on real-time performance cannot be met.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, a device, and a medium for rapidly detecting a 3D object, so as to reduce the use cost of a three-dimensional detection technology and improve the real-time performance of a three-dimensional detection network.
In one aspect, the present invention provides a method for rapidly detecting a 3D object, including:
acquiring left and right views and camera calibration parameters; the left view and the right view are obtained through a binocular camera, and the camera mark parameters are used for representing parameters set by the binocular camera;
processing the database picture and the real distance depth map through a neural network frame, and training to obtain a binary neural network model; the real distance depth map is obtained through a laser radar;
determining a binary neural network according to the binary neural network model by combining a binary data channel packaging technology and a network computing laminar flow technology;
inputting the left view and the right view into the binary neural network, extracting features of the left view and the right view, and determining a disparity map;
constructing point cloud data coordinate points according to the disparity map and the camera calibration parameters, and determining a visual radar signal; the visual radar signal is used for representing point cloud data obtained by construction;
projecting the vision radar signal to obtain a three-channel characteristic diagram, inputting the three-channel characteristic diagram into a neural network, and determining a prediction result;
and detecting the blocking object of the prediction result to determine a target prediction result.
Preferably, adding a spatial constraint term to the binary neural network model comprises:
randomly selecting a plurality of groups of point sets containing three points in the training set of the binary neural network model;
the point set satisfies the following conditions:
Figure BDA0003076290970000021
Figure BDA0003076290970000022
Figure BDA0003076290970000023
in the formula (I), the compound is shown in the specification,
Figure BDA0003076290970000024
represents PAAnd PBConnecting line and PAAnd PCThe angle of the connecting line is the angle of the connecting line,
Figure BDA0003076290970000025
represents PmAnd PkTwo-point Euclidean distance, PA、PB、PCThree points in the point set respectively;
establishing a spatial plane by means of three points in each set of point sets according to the point sets;
according to the space plane, determining the space constraint item and adding the space constraint item to the binary neural network model;
the spatial constraint term is:
Figure BDA0003076290970000026
wherein, loss is a space constraint term, N is the number of point sets,
Figure BDA0003076290970000027
to calculate the resulting spatial plane using the real disparity map,
Figure BDA0003076290970000028
to predict the spatial plane calculated using the net disparity map, | | | | | luminancel1Representing a 1 norm operation and i represents a positive integer.
Preferably, the determining a binary neural network according to the binary neural network model by combining a binary data channel packing technique and a network computing laminar flow technique includes:
compressing binary data in the binary neural network model by using channel dimensions according to the binary data channel packing technology to determine an initial binary neural network;
according to the network computing laminar flow technology, combining the neural network operations in the initial binary neural network to determine the binary neural network; the neural network operation comprises a convolution operation, a batch normalization operation and a data binarization operation.
Preferably, the inputting the left view and the right view into the binary neural network, performing feature extraction on the left view and the right view, and determining the disparity map includes:
inputting the left view and the right view into the binary neural network, extracting the features of the left view and the right view, and determining the image features;
determining an initial image by combining a parallax prediction algorithm according to the image characteristics;
and performing Gaussian filtering processing on the initial image to determine the disparity map.
Preferably, the constructing a point cloud data coordinate point according to the disparity map and the camera calibration parameters to determine a visual radar signal includes:
calculating a coordinate point depth value of the disparity map according to the disparity map and the camera calibration parameters;
initializing point cloud data, storing the coordinate point depth value into the point cloud data, and determining first point cloud data;
calculating the coordinate point of the first point cloud data according to the coordinate point depth value and the camera calibration parameter, and determining second point cloud data;
and converting the left view and the right view into gray maps, merging the gray maps into the second point cloud data, and determining the visual radar signal.
Preferably, the projecting the vision radar signal to obtain a three-channel feature map, inputting the three-channel feature map into a neural network, and determining a prediction result includes:
determining an interested area by taking the binocular camera as a reference;
discretizing the vision radar signal in the region of interest to determine a discretized vision radar signal;
sequencing the discretized vision radar signals according to the height coordinates to determine the height of the point cloud;
performing density calculation on the discretization vision radar signal to determine the density of the point cloud;
determining the gray value of the discretization vision radar signal as point cloud intensity;
storing the point cloud height, the point cloud density and the point cloud intensity into a three-channel feature map;
inputting the three-channel characteristic diagram into the neural network, and determining an output result;
carrying out normalization processing on the output result, and then summing to determine a normalization processing result;
and carrying out threshold processing on the normalization result to determine the prediction result.
Preferably, the detecting the blocking object to the prediction result and determining the target prediction result includes:
projecting the prediction result onto a two-dimensional image according to the camera calibration parameters to determine a first two-dimensional frame value;
detecting the left view and the right view through an object detection algorithm, and determining a second two-dimensional frame value;
comparing the first two-dimensional frame value with the second two-dimensional frame value, and calculating to obtain an intersection ratio;
when the intersection ratio is greater than a threshold value, determining the target prediction result.
On the other hand, the embodiment of the invention also discloses a 3D object rapid detection device, which comprises:
the system comprises a first module, a second module and a third module, wherein the first module is used for acquiring left and right views and camera calibration parameters, the left and right views are acquired through a binocular camera, and the camera calibration parameters are used for representing parameters set by the binocular camera; (ii) a
The second module is used for processing the database picture and the real distance depth map through a neural network framework and training to obtain a binary neural network model, wherein the real distance depth map is obtained through a laser radar;
the third module is used for determining a binary neural network according to the binary neural network model by combining a binary data channel packaging technology and a network computing laminar flow technology;
the fourth module is used for inputting the left view and the right view into the binary neural network, extracting the features of the left view and the right view and determining a disparity map;
the fifth module is used for constructing point cloud data coordinate points according to the disparity map and the camera calibration parameters and determining a visual radar signal;
the sixth module is used for projecting the vision radar signal to obtain a three-channel characteristic diagram, inputting the three-channel characteristic diagram into a neural network and determining a prediction result;
and the seventh module is used for detecting the blocking object of the prediction result and determining a target prediction result.
On the other hand, the embodiment of the invention also discloses an electronic device, which comprises a processor and a memory;
the memory is used for storing programs;
the processor executes the program to implement the method as described above.
On the other hand, the embodiment of the invention also discloses a computer readable storage medium, wherein the storage medium stores a program, and the program is executed by a processor to realize the method.
In another aspect, an embodiment of the present invention further discloses a computer program product or a computer program, where the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and the computer instructions executed by the processor cause the computer device to perform the foregoing method.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects: the method comprises the steps of obtaining left and right views and camera calibration parameters; processing the database picture and the real distance depth map through a neural network frame, and training to obtain a binary neural network model; the accuracy of the binary neural network model can be improved; determining a binary neural network according to the binary neural network model by combining a binary data channel packaging technology and a network computing laminar flow technology; the storage space can be saved and the time loss can be reduced; inputting the left view and the right view into the binary neural network, extracting features of the left view and the right view, and determining a disparity map; constructing point cloud data coordinate points according to the disparity map and the camera calibration parameters, and determining a visual radar signal; the method can reliably acquire the environmental point cloud information at high speed and can also acquire environmental color information; projecting the vision radar signal to obtain a three-channel characteristic diagram, inputting the three-channel characteristic diagram into a neural network, and determining a prediction result; the equipment cost can be reduced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a detailed flow chart of an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The embodiment of the invention provides a 3D object rapid detection method, which comprises the following steps:
acquiring left and right views and camera calibration parameters; the left view and the right view are obtained through a binocular camera, and the camera mark parameters are used for representing parameters set by the binocular camera;
processing the database picture and the real distance depth map through a neural network frame, and training to obtain a binary neural network model; the real distance depth map is obtained through a laser radar;
determining a binary neural network according to a binary neural network model by combining a binary data channel packaging technology and a network computing laminar flow technology;
inputting the left view and the right view into a binary neural network, extracting the characteristics of the left view and the right view, and determining a disparity map;
constructing point cloud data coordinate points by combining camera calibration parameters according to the disparity map, and determining a visual radar signal; the visual radar signal is used for representing point cloud data obtained by construction;
discretizing the vision radar signal, inputting the discretization into a neural network, and determining a prediction result.
Further as a preferred embodiment, adding a spatial constraint term to the binary neural network model includes:
randomly selecting a plurality of groups of point sets containing three points in the training set of the binary neural network model;
the point set satisfies the following conditions:
Figure BDA0003076290970000051
Figure BDA0003076290970000052
Figure BDA0003076290970000053
in the formula (I), the compound is shown in the specification,
Figure BDA0003076290970000061
represents PAAnd PBConnecting line and PAAnd PCThe angle of the connecting line is the angle of the connecting line,
Figure BDA0003076290970000062
represents PmAnd PkTwo-point Euclidean distance, PA、PB、PCThree points in the point set respectively;
establishing a spatial plane by means of three points in each set of point sets according to the point sets;
according to the space plane, determining the space constraint item and adding the space constraint item to the binary neural network model;
the spatial constraint term is:
Figure BDA0003076290970000063
wherein, loss is a space constraint term, N is the number of point sets,
Figure BDA0003076290970000064
to calculate the resulting spatial plane using the real disparity map,
Figure BDA0003076290970000065
to predict the spatial plane calculated using the net disparity map, | | | | | luminancel1Representing a 1 norm operation and i represents a positive integer.
According to one implementation mode of the embodiment of the invention, pictures provided in a KITTI2012 database and a real distance depth map obtained by using a radar are processed by using a torch7 neural network framework, and a positive and negative sample pair is constructed and obtained for training to obtain a binary neural network model; training of the binary neural network model is based on a floating point neural network training process, binaryzation is carried out on floating point weights in forward propagation, and results are calculated by using the weights after binaryzation processing; in the backward propagation, only the floating point weight is updated, and the updated floating point weight is used in the next forward propagation process; due to the binary neural networkThe training parameter setting of the model has a unique rule, when the binary neural network model is trained, the learning rate used for training the binary neural network model is required to be as low as possible so as to reduce the condition that the weight after binarization processing is unstable due to frequent symbol transformation of weight data caused by overlarge learning rate in the training process, and the learning rate used in the training process of the binary neural network model is 2 multiplied by 10-4(ii) a In order to improve the expression capability of the binary neural network, the output of the convolution kernel after binarization processing is made to approach the output of a full-precision convolution kernel as much as possible, and optimization calculation is carried out on the binary convolution kernel calculation by using optimization parameters; the optimization calculation formula is as follows:
Ci≈αBi
wherein, let the size of the network layer convolution layer be k × k, the number of input channels be m, the number of output channels be n, CiFor the output of the ith channel of the floating-point convolution, BiOutputting a channel i for the binary convolution, wherein both the channel i and the channel i have a structure of h multiplied by w multiplied by c, alpha is an optimization parameter, and the alpha optimization parameter is obtained through self-adaptive calculation of a convolution kernel to participate in calculation, so that a binary convolution kernel result is as close to a floating point convolution kernel result as possible; the optimization parameters are obtained through calculation of an optimization function; wherein the optimization function H (B, α) is:
H(B,α)=||Ci–αBi||2
the calculation formula of the optimization parameters is as follows:
Figure BDA0003076290970000066
wherein, WiIs the weight parameter of the ith output channel, | | | | purplel1Is a 1-norm operation.
Adding a space constraint item to the binary neural network model for training, and carrying out training on any point p (u) in the 2D imagei,vi) Wherein u isi,viRespectively representing the abscissa and the ordinate of the P points, i represents the number of the points, and the three-dimensional space mapping expression of the points is obtained through the calculation of camera parameters and is expressed as P (x)i,yi,zi) (ii) a Wherein x isi,yi,ziThree-dimensional coordinates respectively expressed as p points; in training set depth map data with accurate depth information, N groups of point sets containing 3 points are arbitrarily selected, wherein M is { (P)A,PB,PC)iI ═ 0,1 … N }, and any set of points needs to satisfy:
Figure BDA0003076290970000071
Figure BDA0003076290970000072
Figure BDA0003076290970000073
in the formula (I), the compound is shown in the specification,
Figure BDA0003076290970000074
represents PAAnd PBConnecting line and PAAnd PCThe angle of the connecting line is the angle of the connecting line,
Figure BDA0003076290970000075
represents PmAnd PkTwo-point Euclidean distance, PA、PB、PCThree points in the point set respectively;
according to the above conditions, a spatial plane is established by three points in each set of points:
Figure BDA0003076290970000076
according to the space plane, in the training process of the binary neural network model, adding a space constraint item:
Figure BDA0003076290970000077
wherein loss is a space constraint term, N is the number of point sets,
Figure BDA0003076290970000078
to calculate the resulting spatial plane using the real disparity map,
Figure BDA0003076290970000079
to predict the spatial plane calculated using the net disparity map, | | | | | luminancel1Representing a 1 norm operation and i represents a positive integer.
Further as a preferred embodiment, determining a binary neural network according to a binary neural network model in combination with a binary data channel packing technique and a network computing laminar flow technique, includes:
compressing binary data in a binary neural network model by using channel dimensions according to a binary data channel packaging technology to determine an initial binary neural network;
according to the network computing laminar flow technology, combining the neural network operations in the initial binary neural network model to determine a binary neural network; the neural network operation comprises convolution operation, batch normalization operation and data binarization operation.
In order to increase the storage density and the calculation density of binary data, the binary data is compressed by using channel dimensions through a binary data channel packaging technology; in the channel compression process, compressing the channel number to be integral multiple of 64, for the condition that the channel number cannot be divided by 64, filling the data to be integral multiple of 64, taking the channel dimension as the last dimension, and determining an initial binary neural network; in the initial binary neural network, both the weight and the data are quantized to { +1, -1}, but on the hardware level, 1 is represented as +1 and 0 is represented as-1, in this embodiment, the channel part which is less than the integral multiple of 64 is filled with 0, and the filled 0 brings-1 additionally, and for the extra data, the calculation is performed by using the binary dot product operation, and the calculation formula is as follows:
A·B=-(2×popcnt(A^B)–vec_len)
wherein popcnt is the operation of counting the number of 1 in the sequence; vec _ len represents the effective bit length participating in the operation; a and B respectively represent two binary sequences; through the formula, the logical operation is used for replacing the multiplication operation, so that the operation speed is obviously improved; according to the network computing laminar flow technology, combining the neural network operation in the initial binary neural network model, and fusing the convolution operation, batch normalization operation and data binarization operation together; wherein, the laminar flow bnMap computational formula is:
Figure BDA0003076290970000081
wherein C represents the convolution operation result of the non-calculated offset parameter; thresh is a parameter determined by convolution layer bias b, scaling layer coefficient η, Batchnormalization layer scaling coefficient γ, translation parameter β, sample mean μ, and sample standard deviation σ; the calculation formula of thresh is:
Figure BDA0003076290970000082
further preferably, the inputting the left and right views into the binary neural network, performing feature extraction on the left and right views, and determining the disparity map includes:
inputting the left view and the right view into the binary neural network, extracting the features of the left view and the right view, and determining the image features;
determining an initial image by combining a parallax prediction algorithm according to the image characteristics;
and performing Gaussian filtering processing on the initial image to determine the disparity map.
Wherein, the disparity prediction CostdesThe algorithm is as follows:
Figure BDA0003076290970000083
where Cost is calculated over the networkCost of medium parallax, Cost+Is the Cost of the latter disparity, Cost-Is the cost of the previous disparity;
and performing Gaussian filtering on the initial image by using Gaussian filtering with the filtering kernel size of 3 multiplied by 3 to obtain a smoother disparity map, and the method can be used for a more accurate 3-dimensional detection task.
Further as a preferred embodiment, constructing a point cloud data coordinate point according to the disparity map and in combination with camera calibration parameters, and determining a visual radar signal, includes:
calculating the depth value of the coordinate point of the disparity map according to the disparity map and the camera calibration parameter;
initializing point cloud data, storing coordinate point depth values into the point cloud data, and determining first point cloud data;
calculating the coordinate point of the first point cloud data according to the coordinate point depth value and the camera calibration parameter, and determining second point cloud data;
and converting the left view and the right view into gray maps, and combining the gray maps into the second point cloud data to determine the visual radar signals.
Calculating a coordinate point depth value of the disparity map according to the disparity map and camera calibration parameters; and (3) setting the coordinates of a point p in the disparity map as Y (u, v), wherein u and v are respectively the abscissa and the ordinate of the point p, and calculating the depth value D (u, v) of the point by a formula:
Figure BDA0003076290970000091
in the formula (f)UObtaining a horizontal focal length parameter of a left camera in the obtained binocular camera, and b obtaining a horizontal offset parameter of the obtained binocular camera; initializing point cloud data, storing coordinate point depth values into the point cloud data, and determining first point cloud data; setting the dimensions of the disparity map as h × w, namely height and width, so that the storage dimensions of the point cloud are 4 × N, and N ═ h × w; the first two dimensions are respectively used for storing the position of the coordinate point, the third dimension is used for storing the depth of the coordinate point, and the fourth dimension is used for storing the reflection intensity of the coordinate point; coordinates obtained by completing calculationFilling point depth information into the point cloud to obtain first point cloud data; calculating the coordinate point of the first point cloud data according to the coordinate point depth value and the camera calibration parameter, wherein the calculation formula is as follows:
Figure BDA0003076290970000092
in the formula, xyz is a space coordinate of the point cloud, x is a width, y is a height, and z is a depth; (c)U,cV) Is the center pixel position of the corresponding camera, fVIs a vertical focal length; at the moment, the conversion of the three-dimensional 3 × N matrix information of the point cloud is completed, and second point cloud data are obtained; and converting the left view and the right view into a gray scale map through OpenCV, wherein the gray scale map is also in a format of h w, stretching the gray scale map into 1N, and combining the gray scale map into second point cloud data to obtain a visual radar signal.
Further, as a preferred embodiment, discretizing the visual radar signal, inputting the discretized visual radar signal into a neural network, and determining a prediction result, the method includes:
determining an interested area by taking the binocular camera as a reference;
discretizing the vision radar signal in the region of interest to determine a discretized vision radar signal;
sequencing the discretized vision radar signals according to the height coordinates to determine the height of the point cloud;
performing density calculation on the discretization vision radar signal to determine the density of the point cloud;
determining the gray value of the discretization vision radar signal as point cloud intensity;
storing the point cloud height, the point cloud density and the point cloud intensity into a three-channel feature map;
inputting the three-channel characteristic diagram into the neural network, and determining an output result;
carrying out normalization processing on the output result, and then summing to determine a normalization processing result;
and carrying out threshold processing on the normalization result to determine the prediction result.
Wherein, a square area with the left and right 25m and the front 50m is set as an interested area by taking a binocular camera as a reference; reserving the points in the partial area and discarding the points at other positions; discretizing the vision radar signals in the region of interest, sequencing the vision radar signals according to the height coordinate, discarding and counting points with the same horizontal coordinate, and projecting a sequencing result to a three-channel characteristic diagram; the three-channel characteristic diagram respectively stores the density, height and intensity of the point cloud data; the height map takes the maximum value of the sorting result, the density map counts the number N of points in each grid, and then takes the density value as
Figure BDA0003076290970000093
The intensity map is replaced with the gray values of the left and right views; and inputting the three-channel feature map into a network based on the depth residual error and the feature pyramid, and predicting the position coordinates (x, y, z), the geometric coordinates (h, w, l) and the rotation angle ry around the y axis of the object.
Further preferably, the detecting the blocking object to the prediction result and determining the target prediction result includes:
projecting the prediction result onto a two-dimensional image according to the camera calibration parameters to determine a first two-dimensional frame value;
detecting the left view and the right view through an object detection algorithm, and determining a second two-dimensional frame value;
comparing the first two-dimensional frame value with the second two-dimensional frame value, and calculating to obtain an intersection ratio;
when the intersection ratio is greater than a threshold value, determining the target prediction result.
Because some shielding objects, such as a flower bed, an enclosing wall, a garbage can and the like, may exist in the image, and the appearance of the shielding objects in the point cloud is similar to the characteristics of the vehicle, which may cause the detection network to generate false detection, a three-dimensional detection method with multi-information fusion is used, and the prediction result is projected onto a two-dimensional image by using the calibration parameters of the camera, so as to obtain a two-dimensional frame value of the prediction result, i.e. a first two-dimensional frame value; performing border detection on the left view and the right view by using an object detection algorithm YOLOv5, and calculating to obtain a two-dimensional border value of the detection algorithm, namely a second two-dimensional border value; comparing the first two-dimensional frame value with the second two-dimensional frame value, and calculating to obtain an intersection ratio; and when the intersection ratio is larger than the threshold value, the prediction is considered to be correct, otherwise, the prediction result is discarded, the threshold value is set to be 0.5, and finally, the target prediction result is determined.
With reference to fig. 1, the embodiment of the present invention optimizes the training process of the binary neural network model by combining optimization parameters; compressing the channels by combining a binary data channel packaging technology and integrating the neural network operation by a network computing laminar flow technology to realize the optimization of the binary neural network to obtain the binary neural network; acquiring left and right views through a binocular camera, and inputting the left and right views into a binary neural network for feature extraction to obtain a disparity map; the method comprises the steps of setting a binocular camera to obtain camera calibration parameters, calculating coordinates of points in a disparity map by combining the camera calibration parameters, and storing calculated coordinate information into point cloud data to obtain a visual radar signal; discretizing the visual radar signal, projecting a processing result into a three-channel feature map, inputting the three-channel feature map into a network based on a depth residual error and a feature pyramid, and outputting a prediction result, wherein the prediction result comprises position coordinates (x, y, z), geometric coordinates (h, w, l) and a rotation angle ry around a y axis of an object.
Corresponding to the method in fig. 1, an embodiment of the present invention further provides a 3D object rapid detection apparatus, including:
the system comprises a first module, a second module and a third module, wherein the first module is used for acquiring left and right views and camera calibration parameters, the left and right views are acquired through a binocular camera, and the camera calibration parameters are used for representing parameters set by the binocular camera; (ii) a
The second module is used for processing the database picture and the real distance depth map through a neural network framework and training to obtain a binary neural network model, wherein the real distance depth map is obtained through a laser radar;
the third module is used for determining a binary neural network according to the binary neural network model by combining a binary data channel packaging technology and a network computing laminar flow technology;
the fourth module is used for inputting the left view and the right view into the binary neural network, extracting the features of the left view and the right view and determining a disparity map;
the fifth module is used for constructing point cloud data coordinate points according to the disparity map and the camera calibration parameters and determining a visual radar signal;
the sixth module is used for projecting the vision radar signal to obtain a three-channel characteristic diagram, inputting the three-channel characteristic diagram into a neural network and determining a prediction result;
and the seventh module is used for detecting the blocking object of the prediction result and determining a target prediction result.
Corresponding to the method of fig. 1, an embodiment of the present invention further provides an electronic device, including a processor and a memory; the memory is used for storing programs; the processor executes the program to implement the method as described above.
Corresponding to the method of fig. 1, the embodiment of the present invention also provides a computer-readable storage medium, which stores a program, and the program is executed by a processor to implement the method as described above.
The embodiment of the invention also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor to cause the computer device to perform the method illustrated in fig. 1.
In summary, compared with the related art, the embodiment of the invention has the following advantages:
1) in the related art, the laser radar data is used, but the cost of laser radar equipment for acquiring the laser radar data is too high, but the embodiment of the invention can be realized only by images acquired by one binocular camera, so that the use cost is reduced;
2) in the related technology, a large number of networks are used, which results in low operation speed, and the embodiment of the invention optimizes the binary neural network to a certain extent, thereby improving the processing speed;
3) in the related art, the laser radar is difficult to obtain the environment color signal, and the embodiment of the invention can reliably obtain the environment color signal at high speed, thereby improving the accuracy of object detection.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A3D object rapid detection method is characterized by comprising the following steps:
acquiring left and right views and camera calibration parameters; the left view and the right view are obtained through a binocular camera, and the camera mark parameters are used for representing parameters set by the binocular camera;
processing the database picture and the real distance depth map through a neural network frame, and training to obtain a binary neural network model; the real distance depth map is obtained through a laser radar;
determining a binary neural network according to the binary neural network model by combining a binary data channel packaging technology and a network computing laminar flow technology;
inputting the left view and the right view into the binary neural network, extracting features of the left view and the right view, and determining a disparity map;
constructing point cloud data coordinate points according to the disparity map and the camera calibration parameters, and determining a visual radar signal; the visual radar signal is used for representing point cloud data obtained by construction;
projecting the vision radar signal to obtain a three-channel characteristic diagram, inputting the three-channel characteristic diagram into a neural network, and determining a prediction result;
and detecting the blocking object of the prediction result to determine a target prediction result.
2. The method for rapidly detecting the 3D object according to claim 1, further comprising: adding a spatial constraint term to the binary neural network model, specifically:
randomly selecting a plurality of groups of point sets containing three points in the training set of the binary neural network model;
the point set satisfies the following conditions:
Figure FDA0003076290960000011
Figure FDA0003076290960000012
Figure FDA0003076290960000013
wherein the content of the first and second substances,
Figure FDA0003076290960000014
represents PAAnd PBConnecting line and PAAnd PCThe angle of the connecting line is the angle of the connecting line,
Figure FDA0003076290960000015
represents PmAnd PkTwo-point Euclidean distance, PA、PB、PCThree points in the point set respectively;
establishing a spatial plane by means of three points in each set of point sets according to the point sets;
according to the space plane, determining the space constraint item and adding the space constraint item to the binary neural network model;
the spatial constraint term is:
Figure FDA0003076290960000016
wherein loss is a space constraint term, N is the number of point sets,
Figure FDA0003076290960000017
to calculate the resulting spatial plane using the real disparity map,
Figure FDA0003076290960000018
to predict the spatial plane calculated using the net disparity map, | | | | | luminancel1Representing a 1 norm operation and i represents a positive integer.
3. The method for rapidly detecting the 3D object according to claim 1, wherein the determining the binary neural network according to the binary neural network model by combining a binary data channel packing technology and a network computing laminar flow technology comprises:
compressing binary data in the binary neural network model by using channel dimensions according to the binary data channel packing technology to determine an initial binary neural network;
according to the network computing laminar flow technology, combining the neural network operations in the initial binary neural network to determine the binary neural network; the neural network operation comprises a convolution operation, a batch normalization operation and a data binarization operation.
4. The 3D object fast detection method according to claim 1, wherein the inputting the left and right views into the binary neural network, performing feature extraction on the left and right views, and determining the disparity map comprises:
inputting the left view and the right view into the binary neural network, extracting the features of the left view and the right view, and determining the image features;
determining an initial image by combining a parallax prediction algorithm according to the image characteristics;
and performing Gaussian filtering processing on the initial image to determine the disparity map.
5. The method for rapidly detecting the 3D object according to claim 1, wherein the constructing point cloud data coordinate points according to the disparity map and the camera calibration parameters and determining the visual radar signal comprise:
calculating a coordinate point depth value of the disparity map according to the disparity map and the camera calibration parameters;
initializing point cloud data, storing the coordinate point depth value into the point cloud data, and determining first point cloud data;
calculating the coordinate point of the first point cloud data according to the coordinate point depth value and the camera calibration parameter, and determining second point cloud data;
and converting the left view and the right view into gray maps, merging the gray maps into the second point cloud data, and determining the visual radar signal.
6. The method according to claim 1, wherein the projecting the visual radar signal to obtain a three-channel signature, inputting the three-channel signature into a neural network, and determining a prediction result comprises:
determining an interested area by taking the binocular camera as a reference;
discretizing the vision radar signal in the region of interest to determine a discretized vision radar signal;
sequencing the discretized vision radar signals according to the height coordinates to determine the height of the point cloud;
performing density calculation on the discretization vision radar signal to determine the density of the point cloud;
determining the gray value of the discretization vision radar signal as point cloud intensity;
storing the point cloud height, the point cloud density and the point cloud intensity into a three-channel feature map;
inputting the three-channel characteristic diagram into the neural network, and determining an output result;
carrying out normalization processing on the output result, and then summing to determine a normalization processing result;
and carrying out threshold processing on the normalization result to determine the prediction result.
7. The 3D object rapid detection method according to claim 1, wherein the performing the occlusion detection on the prediction result and determining the target prediction result comprises:
projecting the prediction result onto a two-dimensional image according to the camera calibration parameters to determine a first two-dimensional frame value;
detecting the left view and the right view through an object detection algorithm, and determining a second two-dimensional frame value;
comparing the first two-dimensional frame value with the second two-dimensional frame value, and calculating to obtain an intersection ratio;
when the intersection ratio is greater than a threshold value, determining the target prediction result.
8. A3D object rapid detection device, comprising:
the system comprises a first module, a second module and a third module, wherein the first module is used for acquiring left and right views and camera calibration parameters, the left and right views are acquired through a binocular camera, and the camera calibration parameters are used for representing parameters set by the binocular camera; (ii) a
The second module is used for processing the database picture and the real distance depth map through a neural network framework and training to obtain a binary neural network model, wherein the real distance depth map is obtained through a laser radar;
the third module is used for determining a binary neural network according to the binary neural network model by combining a binary data channel packaging technology and a network computing laminar flow technology;
the fourth module is used for inputting the left view and the right view into the binary neural network, extracting the features of the left view and the right view and determining a disparity map;
the fifth module is used for constructing point cloud data coordinate points according to the disparity map and the camera calibration parameters and determining a visual radar signal;
the sixth module is used for projecting the vision radar signal to obtain a three-channel characteristic diagram, inputting the three-channel characteristic diagram into a neural network and determining a prediction result;
and the seventh module is used for detecting the blocking object of the prediction result and determining a target prediction result.
9. An electronic device comprising a processor and a memory;
the memory is used for storing programs;
the processor executing the program realizes the method according to any one of claims 1-7.
10. A computer-readable storage medium, characterized in that the storage medium stores a program, which is executed by a processor to implement the method according to any one of claims 1-7.
CN202110553663.6A 2021-05-20 2021-05-20 3D object rapid detection method, device, equipment and medium Active CN113281779B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110553663.6A CN113281779B (en) 2021-05-20 2021-05-20 3D object rapid detection method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110553663.6A CN113281779B (en) 2021-05-20 2021-05-20 3D object rapid detection method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN113281779A true CN113281779A (en) 2021-08-20
CN113281779B CN113281779B (en) 2022-07-12

Family

ID=77280479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110553663.6A Active CN113281779B (en) 2021-05-20 2021-05-20 3D object rapid detection method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN113281779B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114359891A (en) * 2021-12-08 2022-04-15 华南理工大学 Three-dimensional vehicle detection method, system, device and medium
CN115619740A (en) * 2022-10-19 2023-01-17 广西交科集团有限公司 High-precision video speed measuring method and system, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110148179A (en) * 2019-04-19 2019-08-20 北京地平线机器人技术研发有限公司 A kind of training is used to estimate the neural net model method, device and medium of image parallactic figure
US10503966B1 (en) * 2018-10-11 2019-12-10 Tindei Network Technology (Shanghai) Co., Ltd. Binocular pedestrian detection system having dual-stream deep learning neural network and the methods of using the same
CN110738241A (en) * 2019-09-24 2020-01-31 中山大学 binocular stereo vision matching method based on neural network and operation frame thereof
CN111028285A (en) * 2019-12-03 2020-04-17 浙江大学 Depth estimation method based on binocular vision and laser radar fusion
CN111179330A (en) * 2019-12-27 2020-05-19 福建(泉州)哈工大工程技术研究院 Binocular vision scene depth estimation method based on convolutional neural network
CN111444811A (en) * 2020-03-23 2020-07-24 复旦大学 Method for detecting three-dimensional point cloud target
CN112233163A (en) * 2020-12-14 2021-01-15 中山大学 Depth estimation method and device for laser radar stereo camera fusion and medium thereof
CN112633324A (en) * 2020-11-27 2021-04-09 中山大学 System, method and medium for matching stereoscopic vision around the eyes based on neural network

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10503966B1 (en) * 2018-10-11 2019-12-10 Tindei Network Technology (Shanghai) Co., Ltd. Binocular pedestrian detection system having dual-stream deep learning neural network and the methods of using the same
CN110148179A (en) * 2019-04-19 2019-08-20 北京地平线机器人技术研发有限公司 A kind of training is used to estimate the neural net model method, device and medium of image parallactic figure
CN110738241A (en) * 2019-09-24 2020-01-31 中山大学 binocular stereo vision matching method based on neural network and operation frame thereof
CN111028285A (en) * 2019-12-03 2020-04-17 浙江大学 Depth estimation method based on binocular vision and laser radar fusion
CN111179330A (en) * 2019-12-27 2020-05-19 福建(泉州)哈工大工程技术研究院 Binocular vision scene depth estimation method based on convolutional neural network
CN111444811A (en) * 2020-03-23 2020-07-24 复旦大学 Method for detecting three-dimensional point cloud target
CN112633324A (en) * 2020-11-27 2021-04-09 中山大学 System, method and medium for matching stereoscopic vision around the eyes based on neural network
CN112233163A (en) * 2020-12-14 2021-01-15 中山大学 Depth estimation method and device for laser radar stereo camera fusion and medium thereof

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
伊进延: ""基于深度学习的三维人脸重建技术研究"", 《中国优秀博硕士学位论文全文数据库(硕士) 医药卫生科技辑》 *
刘奕博: ""基于车载双目相机的目标检测及其运动状态估计"", 《中国优秀博硕士学位论文全文数据库(硕士) 工程科技II辑》 *
王刚 等: ""基于深度学习的三维目标检测方法研究"", 《计算机应用与软件》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114359891A (en) * 2021-12-08 2022-04-15 华南理工大学 Three-dimensional vehicle detection method, system, device and medium
CN114359891B (en) * 2021-12-08 2024-05-28 华南理工大学 Three-dimensional vehicle detection method, system, device and medium
CN115619740A (en) * 2022-10-19 2023-01-17 广西交科集团有限公司 High-precision video speed measuring method and system, electronic equipment and storage medium
CN115619740B (en) * 2022-10-19 2023-08-08 广西交科集团有限公司 High-precision video speed measuring method, system, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113281779B (en) 2022-07-12

Similar Documents

Publication Publication Date Title
Riegler et al. Octnetfusion: Learning depth fusion from data
US11941831B2 (en) Depth estimation
Hoppe et al. Incremental Surface Extraction from Sparse Structure-from-Motion Point Clouds.
CN110879994A (en) Three-dimensional visual inspection detection method, system and device based on shape attention mechanism
CN112132213A (en) Sample image processing method and device, electronic equipment and storage medium
CN113281779B (en) 3D object rapid detection method, device, equipment and medium
Kumari et al. A survey on stereo matching techniques for 3D vision in image processing
CN113052109A (en) 3D target detection system and 3D target detection method thereof
CN112562001A (en) Object 6D pose estimation method, device, equipment and medium
Merras et al. Multi-view 3D reconstruction and modeling of the unknown 3D scenes using genetic algorithms
CN114463736A (en) Multi-target detection method and device based on multi-mode information fusion
US9361412B1 (en) Method for the simulation of LADAR sensor range data
CN115457492A (en) Target detection method and device, computer equipment and storage medium
CN116452748A (en) Implicit three-dimensional reconstruction method, system, storage medium and terminal based on differential volume rendering
Wiemann et al. Automatic Map Creation For Environment Modelling In Robotic Simulators.
Buck et al. Ignorance is bliss: flawed assumptions in simulated ground truth
Lin et al. A-SATMVSNet: An attention-aware multi-view stereo matching network based on satellite imagery
CN116246119A (en) 3D target detection method, electronic device and storage medium
Zhang et al. Object measurement in real underwater environments using improved stereo matching with semantic segmentation
EP4152274A1 (en) System and method for predicting an occupancy probability of a point in an environment, and training method thereof
CN115240168A (en) Perception result obtaining method and device, computer equipment and storage medium
CN115359119A (en) Workpiece pose estimation method and device for disordered sorting scene
CN117333676B (en) Point cloud feature extraction method and point cloud visual detection method based on graph expression
US20230281877A1 (en) Systems and methods for 3d point cloud densification
Geetha Kiran et al. Automatic 3D view generation from a Single 2D Image for both Indoor and Outdoor Scenes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant