CN113281779B - 3D object rapid detection method, device, equipment and medium - Google Patents
3D object rapid detection method, device, equipment and medium Download PDFInfo
- Publication number
- CN113281779B CN113281779B CN202110553663.6A CN202110553663A CN113281779B CN 113281779 B CN113281779 B CN 113281779B CN 202110553663 A CN202110553663 A CN 202110553663A CN 113281779 B CN113281779 B CN 113281779B
- Authority
- CN
- China
- Prior art keywords
- neural network
- determining
- binary
- binary neural
- point cloud
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 32
- 238000013528 artificial neural network Methods 0.000 claims abstract description 74
- 238000003062 neural network model Methods 0.000 claims abstract description 46
- 238000000034 method Methods 0.000 claims abstract description 40
- 238000012545 processing Methods 0.000 claims abstract description 28
- 230000000007 visual effect Effects 0.000 claims abstract description 25
- 238000005516 engineering process Methods 0.000 claims abstract description 24
- 238000012549 training Methods 0.000 claims abstract description 24
- 238000012536 packaging technology Methods 0.000 claims abstract description 10
- 238000010586 diagram Methods 0.000 claims description 21
- 238000004364 calculation method Methods 0.000 claims description 19
- 238000003860 storage Methods 0.000 claims description 16
- 238000004422 calculation algorithm Methods 0.000 claims description 13
- 238000010606 normalization Methods 0.000 claims description 13
- 238000001914 filtration Methods 0.000 claims description 7
- 230000000903 blocking effect Effects 0.000 claims description 6
- 238000012856 packing Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 5
- 238000012163 sequencing technique Methods 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 9
- 230000008569 process Effects 0.000 description 7
- 238000004590 computer program Methods 0.000 description 6
- 238000007667 floating Methods 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S17/00—Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
- G01S17/88—Lidar systems specially adapted for specific applications
- G01S17/89—Lidar systems specially adapted for specific applications for mapping or imaging
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S17/00—Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
- G01S17/86—Combinations of lidar systems with systems other than lidar, radar or sonar, e.g. with direction finders
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Remote Sensing (AREA)
- Theoretical Computer Science (AREA)
- Radar, Positioning & Navigation (AREA)
- Electromagnetism (AREA)
- Computer Networks & Wireless Communication (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a method, a device, equipment and a medium for rapidly detecting a 3D object, wherein the method comprises the following steps: acquiring left and right views and camera calibration parameters; processing the database picture and the real distance depth map through a neural network frame, and training to obtain a binary neural network model; determining a binary neural network according to a binary neural network model by combining a binary data channel packaging technology and a network computing laminar flow technology; inputting the left view and the right view into a binary neural network, extracting the characteristics of the left view and the right view, and determining a disparity map; constructing point cloud data coordinate points by combining camera calibration parameters according to the disparity map, and determining a visual radar signal; discretizing the visual radar signal, inputting the discretized visual radar signal into a network based on the depth residual error and the feature pyramid, and determining a prediction result. The invention reduces the cost of three-dimensional object detection, improves the speed and accuracy of detection, and can be widely applied to the technical field of three-dimensional object detection.
Description
Technical Field
The invention relates to the technical field of three-dimensional object detection, in particular to a method, a device, equipment and a medium for rapidly detecting a 3D object.
Background
In recent years, with the development of the automatic driving technology, the status of the laser radar has become more and more important. The laser radar has the advantages that the distance including objects can be directly measured, an automatic driving target detection algorithm can be developed, and the positions and the advancing directions of different targets can be accurately estimated in three-dimensional target detection.
Nowadays, a technology for obtaining geometric coordinates and position coordinates of surrounding objects by processing point cloud signal data generated by a laser radar is well developed. Such as Frustum PointNets, etc. It shows very high accuracy on testing of the KITTI data set. However, it also requires pre-processing of the calibrated camera image before processing the point cloud data.
The following disadvantages of such a design can be seen: the accuracy of the model depends largely on the camera image and the associated convolutional neural network; the whole process has more neural networks and more complex models, which can cause too high delay and low efficiency.
Most of the existing three-dimensional detection technologies rely on laser radar, but the hardware cost of the laser radar is high. For example, HDL-64E lidar manufactured by Velodyne corporation, USA, has a domestic selling price of over fifty ten thousand yuan. In addition, the point cloud data obtained by the laser radar is sparse, which may result in a target with a complex appearance, or some small objects cannot be reflected in the point cloud. Moreover, the laser radar can only provide sparse measurement points, cannot provide image and color information, and is difficult to perform further task development on the basis of the sparse measurement points. In addition, the existing three-dimensional detection networks are slow, and have high requirements on hardware storage, calculation and energy consumption. The three-dimensional detection network has high calculation amount requirements, and is difficult to be deployed in practical application, so that the requirement on real-time performance cannot be met.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, a device, and a medium for rapidly detecting a 3D object, so as to reduce the use cost of a three-dimensional detection technology and improve the real-time performance of a three-dimensional detection network.
In one aspect, the present invention provides a method for rapidly detecting a 3D object, including:
acquiring left and right views and camera calibration parameters; the left view and the right view are obtained through a binocular camera, and the camera mark parameters are used for representing parameters set by the binocular camera;
processing the database picture and the real distance depth map through a neural network frame, and training to obtain a binary neural network model; the real distance depth map is obtained through a laser radar;
determining a binary neural network according to the binary neural network model by combining a binary data channel packaging technology and a network computing laminar flow technology;
inputting the left view and the right view into the binary neural network, extracting features of the left view and the right view, and determining a disparity map;
constructing point cloud data coordinate points according to the disparity map and the camera calibration parameters, and determining a visual radar signal; the visual radar signal is used for representing point cloud data obtained by construction;
projecting the vision radar signal to obtain a three-channel characteristic diagram, inputting the three-channel characteristic diagram into a neural network, and determining a prediction result;
and detecting the blocking object of the prediction result to determine a target prediction result.
Preferably, adding a spatial constraint term to the binary neural network model comprises:
randomly selecting a plurality of groups of point sets containing three points in the training set of the binary neural network model;
the point set satisfies the following conditions:
in the formula,is represented by PAAnd PBConnecting line and PAAnd PCThe angle of the connecting line is the angle of the connecting line,represents PmAnd PkTwo-point Euclidean distance, PA、PB、PCThree points in the point set respectively;
establishing a spatial plane by means of three points in each set of point sets according to the point sets;
according to the space plane, determining the space constraint item and adding the space constraint item to the binary neural network model;
the spatial constraint term is:
wherein, loss is a space constraint term, N is the number of point sets,to calculate the resulting spatial plane using the real disparity map,to predict the spatial plane calculated using the net disparity map, | | | | | luminancel1Representing a 1 norm operation and i represents a positive integer.
Preferably, the determining a binary neural network according to the binary neural network model by combining a binary data channel packing technique and a network computing laminar flow technique includes:
compressing binary data in the binary neural network model by using channel dimensions according to the binary data channel packing technology to determine an initial binary neural network;
according to the network computing laminar flow technology, combining the neural network operations in the initial binary neural network to determine the binary neural network; the neural network operation comprises a convolution operation, a batch normalization operation and a data binarization operation.
Preferably, the inputting the left view and the right view into the binary neural network, performing feature extraction on the left view and the right view, and determining the disparity map includes:
inputting the left view and the right view into the binary neural network, extracting the features of the left view and the right view, and determining the image features;
determining an initial image by combining a parallax prediction algorithm according to the image characteristics;
and performing Gaussian filtering processing on the initial image to determine the disparity map.
Preferably, the constructing a point cloud data coordinate point according to the disparity map and the camera calibration parameters to determine a visual radar signal includes:
calculating a coordinate point depth value of the disparity map according to the disparity map and the camera calibration parameters;
initializing point cloud data, storing the coordinate point depth value into the point cloud data, and determining first point cloud data;
calculating the coordinate point of the first point cloud data according to the coordinate point depth value and the camera calibration parameter, and determining second point cloud data;
and converting the left view and the right view into gray maps, merging the gray maps into the second point cloud data, and determining the visual radar signal.
Preferably, the projecting the vision radar signal to obtain a three-channel characteristic diagram, inputting the three-channel characteristic diagram into a neural network, and determining a prediction result includes:
determining an interested area by taking the binocular camera as a reference;
discretizing the vision radar signal in the region of interest to determine a discretized vision radar signal;
sequencing the discretized vision radar signals according to the height coordinates to determine the height of the point cloud;
performing density calculation on the discretization vision radar signal to determine the density of the point cloud;
determining the gray value of the discretization vision radar signal as point cloud intensity;
storing the point cloud height, the point cloud density and the point cloud intensity into a three-channel feature map;
inputting the three-channel characteristic diagram into the neural network, and determining an output result;
carrying out normalization processing on the output result, and then summing to determine a normalization processing result;
and carrying out threshold processing on the normalization result to determine the prediction result.
Preferably, the detecting the blocking object to the prediction result and determining the target prediction result includes:
projecting the prediction result onto a two-dimensional image according to the camera calibration parameters to determine a first two-dimensional frame value;
detecting the left view and the right view through an object detection algorithm, and determining a second two-dimensional frame value;
comparing the first two-dimensional frame value with the second two-dimensional frame value, and calculating to obtain an intersection ratio;
when the intersection ratio is greater than a threshold value, determining the target prediction result.
On the other hand, the embodiment of the invention also discloses a 3D object rapid detection device, which comprises:
the system comprises a first module, a second module and a third module, wherein the first module is used for acquiring left and right views and camera calibration parameters, the left and right views are acquired through a binocular camera, and the camera calibration parameters are used for representing parameters set by the binocular camera; (ii) a
The second module is used for processing the database picture and the real distance depth map through a neural network framework and training to obtain a binary neural network model, wherein the real distance depth map is obtained through a laser radar;
the third module is used for determining a binary neural network according to the binary neural network model by combining a binary data channel packaging technology and a network computing laminar flow technology;
the fourth module is used for inputting the left view and the right view into the binary neural network, extracting the features of the left view and the right view and determining a disparity map;
the fifth module is used for constructing a point cloud data coordinate point according to the disparity map and the camera calibration parameters and determining a visual radar signal;
the sixth module is used for projecting the vision radar signal to obtain a three-channel characteristic diagram, inputting the three-channel characteristic diagram into a neural network and determining a prediction result;
and the seventh module is used for detecting the blocking object of the prediction result and determining a target prediction result.
On the other hand, the embodiment of the invention also discloses an electronic device, which comprises a processor and a memory;
the memory is used for storing programs;
the processor executes the program to implement the method as described above.
In another aspect, an embodiment of the present invention further discloses a computer-readable storage medium, where the storage medium stores a program, and the program is executed by a processor to implement the method described above.
In another aspect, an embodiment of the present invention further discloses a computer program product or a computer program, where the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and the computer instructions executed by the processor cause the computer device to perform the foregoing method.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects: the method comprises the steps of obtaining left and right views and camera calibration parameters; processing the database picture and the real distance depth map through a neural network frame, and training to obtain a binary neural network model; the accuracy of the binary neural network model can be improved; determining a binary neural network according to the binary neural network model by combining a binary data channel packaging technology and a network computing laminar flow technology; the storage space can be saved and the time loss can be reduced; inputting the left view and the right view into the binary neural network, extracting features of the left view and the right view, and determining a disparity map; constructing point cloud data coordinate points according to the disparity map and the camera calibration parameters, and determining a visual radar signal; the method can acquire the environmental point cloud information reliably at high speed and can also acquire environmental color information; projecting the vision radar signal to obtain a three-channel characteristic diagram, inputting the three-channel characteristic diagram into a neural network, and determining a prediction result; the equipment cost can be reduced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a detailed flow chart of an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
The embodiment of the invention provides a 3D object rapid detection method, which comprises the following steps:
acquiring left and right views and camera calibration parameters; the left view and the right view are obtained through a binocular camera, and the camera mark parameters are used for representing parameters set by the binocular camera;
processing the database picture and the real distance depth map through a neural network frame, and training to obtain a binary neural network model; the real distance depth map is obtained through a laser radar;
determining a binary neural network according to a binary neural network model by combining a binary data channel packaging technology and a network computing laminar flow technology;
inputting the left view and the right view into a binary neural network, extracting the characteristics of the left view and the right view, and determining a disparity map;
according to the disparity map, point cloud data coordinate points are constructed by combining camera calibration parameters, and visual radar signals are determined; the visual radar signal is used for representing point cloud data obtained by construction;
discretizing the vision radar signal, inputting the vision radar signal into a neural network, and determining a prediction result.
Further as a preferred embodiment, adding a spatial constraint term to the binary neural network model includes:
in the training set of the binary neural network model, randomly selecting a plurality of groups of point sets containing three points;
the point set satisfies the following conditions:
in the formula,is represented by PAAnd PBConnecting line and PAAnd PCThe angle of the connecting line is the angle of the connecting line,represents PmAnd PkTwo-point Euclidean distance, PA、PB、PCThree points in the point set respectively;
establishing a spatial plane by means of three points in each set of point sets according to the point sets;
according to the space plane, determining the space constraint item and adding the space constraint item to the binary neural network model;
the spatial constraint term is:
wherein, loss is a space constraint term, N is the number of point sets,to calculate the resulting spatial plane using the real disparity map,to predict the spatial plane calculated using the net disparity map, | | | | | luminancel1Representing a 1 norm operation and i represents a positive integer.
According to one implementation mode of the embodiment of the invention, pictures provided in a KITTI2012 database and a real distance depth map obtained by using a radar are processed by using a torch7 neural network framework, and a positive and negative sample pair is constructed and obtained for training to obtain a binary neural network model; training of the binary neural network model is based on a floating point neural network training process, binaryzation is carried out on floating point weights in forward propagation, and results are calculated by using the weights after binaryzation processing; in the backward propagation, only the floating point weight is updated, and the updated floating point weight is used in the next forward propagation process; because the training parameter setting of the binary neural network model has a unique rule, when the binary neural network model is trained, the learning rate used for training the binary neural network model should be as low as possible so as to reduce the condition that the weight after binarization processing is unstable due to frequent symbol conversion of weight data caused by overlarge learning rate in the training process, and the learning rate used in the training process of the binary neural network model is 2 multiplied by 10-4(ii) a In order to improve the expression capability of the binary neural network, the output of the convolution kernel after the binarization processing is made to approach the output of the full-precision convolution kernel as much as possible, and the optimization parameters are used for calculating the binary convolution kernelLine optimization calculation; the optimization calculation formula is as follows:
Ci≈αBi
wherein, let the size of the network layer convolution layer be k × k, the number of input channels be m, the number of output channels be n, CiFor the output of the ith channel of the floating-point convolution, BiOutputting a channel i for the binary convolution, wherein both the channel i and the channel i have a structure of h multiplied by w multiplied by c, alpha is an optimization parameter, and the alpha optimization parameter is obtained through self-adaptive calculation of a convolution kernel to participate in calculation, so that a binary convolution kernel result is as close to a floating point convolution kernel result as possible; the optimization parameters are obtained through calculation of an optimization function; wherein the optimization function H (B, α) is:
H(B,α)=||Ci–αBi||2;
the calculation formula of the optimization parameters is as follows:
wherein, WiIs the weight parameter of the ith output channel, | | | caltingl1Is a 1-norm operation.
Adding a space constraint item to the binary neural network model for training, and carrying out training on any point p (u) in the 2D imagei,vi) Wherein u isi,viRespectively representing the abscissa and the ordinate of the P points, i represents the number of the points, and the three-dimensional space mapping expression of the points is obtained through the calculation of camera parameters and is expressed as P (x)i,yi,zi) (ii) a Wherein x isi,yi,ziThree-dimensional coordinates respectively expressed as p points; in training set depth map data with accurate depth information, N groups of point sets containing 3 points are arbitrarily selected, wherein M is { (P)A,PB,PC)iI ═ 0,1 … N }, and any set of points needs to satisfy:
in the formula,represents PAAnd PBConnecting line and PAAnd PCThe angle of the connecting line is the angle of the connecting line,represents PmAnd PkTwo-point Euclidean distance, PA、PB、PCThree points in the point set are respectively;
according to the above conditions, a spatial plane is established by three points in each set of points:
according to the space plane, in the training process of the binary neural network model, adding a space constraint item:
wherein loss is a space constraint term, N is the number of point sets,to calculate the resulting spatial plane using the real disparity map,to predict the spatial plane calculated using the net disparity map, | | | | | luminancel1Representing a 1 norm operation and i represents a positive integer.
As a further preferred embodiment, determining a binary neural network according to a binary neural network model in combination with a binary data channel packing technique and a network computing laminar flow technique includes:
compressing binary data in a binary neural network model by using channel dimensions according to a binary data channel packing technology, and determining an initial binary neural network;
according to the network computing laminar flow technology, combining the neural network operations in the initial binary neural network model to determine a binary neural network; the neural network operation comprises convolution operation, batch normalization operation and data binarization operation.
In order to increase the storage density and the calculation density of binary data, the binary data is compressed by using channel dimensions through a binary data channel packaging technology; in the channel compression process, compressing the channel number to be integral multiple of 64, for the condition that the channel number cannot be divided by 64, filling the data to be integral multiple of 64, taking the channel dimension as the last dimension, and determining an initial binary neural network; in the initial binary neural network, both the weight and the data are quantized to { +1, -1}, but on the hardware level, 1 is represented as +1 and 0 is represented as-1, in this embodiment, the channel part which is less than the integral multiple of 64 is filled with 0, and the filled 0 brings-1 additionally, and for the extra data, the calculation is performed by using the binary dot product operation, and the calculation formula is as follows:
A·B=-(2×popcnt(A^B)–vec_len)
wherein popcnt is the operation of counting the number of 1 in the sequence; vec _ len represents the effective bit length participating in the operation; a and B respectively represent two binary sequences; through the formula, the logical operation is used for replacing the multiplication operation, so that the operation speed is obviously improved; according to the network computing laminar flow technology, combining the neural network operation in the initial binary neural network model, and fusing the convolution operation, batch normalization operation and data binarization operation together; wherein, the laminar flow bnMap computational formula is:
wherein C represents the convolution operation result of the non-calculated offset parameter; thresh is a parameter determined by convolution layer bias b, scaling layer coefficient η, Batchnormalization layer scaling coefficient γ, translation parameter β, sample mean μ, and sample standard deviation σ; the calculation formula of thresh is:
further preferably, the inputting the left and right views into the binary neural network, performing feature extraction on the left and right views, and determining the disparity map includes:
inputting the left view and the right view into the binary neural network, extracting the features of the left view and the right view, and determining the image features;
determining an initial image by combining a parallax prediction algorithm according to the image characteristics;
and performing Gaussian filtering processing on the initial image to determine the disparity map.
Wherein, the disparity prediction CostdesThe algorithm is as follows:
wherein Cost is the Cost of parallax in network calculation, and Cost+Is the Cost of the latter disparity, Cost-Is the cost of the previous disparity;
and performing Gaussian filtering on the initial image by using Gaussian filtering with the filtering kernel size of 3 multiplied by 3 to obtain a smoother disparity map, and the method can be used for a more accurate 3-dimensional detection task.
Further as a preferred embodiment, the method for determining the visual radar signal by constructing a point cloud data coordinate point according to the disparity map and by combining the camera calibration parameters comprises the following steps:
calculating the depth value of the coordinate point of the disparity map according to the disparity map and the camera calibration parameter;
initializing point cloud data, storing coordinate point depth values into the point cloud data, and determining first point cloud data;
calculating the coordinate point of the first point cloud data according to the coordinate point depth value and the camera calibration parameter, and determining second point cloud data;
and converting the left view and the right view into gray maps, and combining the gray maps into the second point cloud data to determine the visual radar signals.
Calculating a coordinate point depth value of the disparity map according to the disparity map and camera calibration parameters; and (3) setting the coordinates of a point p in the disparity map as Y (u, v), wherein u and v are respectively the abscissa and the ordinate of the point p, and calculating the depth value D (u, v) of the point by a formula:
in the formula (f)UObtaining a horizontal focal length parameter of a left camera in the obtained binocular camera, and b obtaining a horizontal offset parameter of the obtained binocular camera; initializing point cloud data, storing coordinate point depth values into the point cloud data, and determining first point cloud data; setting the dimensions of the disparity map as h × w, namely height and width, so that the storage dimensions of the point cloud are 4 × N, and N ═ h × w; the first two dimensions are respectively used for storing the position of the coordinate point, the third dimension is used for storing the depth of the coordinate point, and the fourth dimension is used for storing the reflection intensity of the coordinate point; filling the coordinate point depth information obtained after the calculation into the point cloud to obtain first point cloud data; calculating the coordinate points of the first point cloud data according to the coordinate point depth values and the camera calibration parameters, wherein the calculation formula is as follows:
in the formula, xyz is a space coordinate of the point cloud, x is a width, y is a height, and z is a depth; (c)U,cV) Is the center pixel position of the corresponding camera, fVIs a vertical focal length(ii) a At the moment, the conversion of the three-dimensional 3 × N matrix information of the point cloud is completed, and second point cloud data are obtained; and converting the left view and the right view into a gray scale map through OpenCV, wherein the gray scale map is also in a format of h w, stretching the gray scale map into 1N, and combining the gray scale map into second point cloud data to obtain a visual radar signal.
Further, as a preferred embodiment, discretizing the visual radar signal, inputting the discretized visual radar signal into a neural network, and determining a prediction result, the method includes:
determining an interested area by taking the binocular camera as a reference;
discretizing the vision radar signal in the region of interest to determine a discretized vision radar signal;
sequencing the discretized vision radar signals according to the height coordinates to determine the height of the point cloud;
performing density calculation on the discretization vision radar signal to determine the density of the point cloud;
determining the gray value of the discretization vision radar signal as point cloud intensity;
storing the point cloud height, the point cloud density and the point cloud intensity into a three-channel feature map;
inputting the three-channel characteristic diagram into the neural network, and determining an output result;
carrying out normalization processing on the output result, and then summing to determine a normalization processing result;
and carrying out threshold processing on the normalization result to determine the prediction result.
Wherein, a square area with the left and right 25m and the front 50m is set as an interested area by taking a binocular camera as a reference; reserving the points in the partial area and discarding the points at other positions; discretizing the vision radar signals in the region of interest, sequencing the vision radar signals according to the height coordinate, discarding and counting points with the same horizontal coordinate, and projecting a sequencing result to a three-channel characteristic diagram; the three-channel characteristic diagram respectively stores the density, height and intensity of the point cloud data; the height map takes the maximum value of the sorting result, and the density map counts each timeThe number of points N in each grid is then taken as the density valueThe intensity map is replaced with the gray values of the left and right views; and inputting the three-channel characteristic diagram into a network based on the depth residual error and the characteristic pyramid, and predicting the position coordinates (x, y, z), the geometric coordinates (h, w, l) and the rotation angle ry around the y axis of the object.
Further preferably, the detecting the blocking object to the prediction result and determining the target prediction result includes:
projecting the prediction result onto a two-dimensional image according to the camera calibration parameters to determine a first two-dimensional frame value;
detecting the left view and the right view through an object detection algorithm, and determining a second two-dimensional frame value;
comparing the first two-dimensional frame value with the second two-dimensional frame value, and calculating to obtain an intersection ratio;
when the intersection ratio is greater than a threshold value, determining the target prediction result.
Because some shielding objects, such as a flower bed, an enclosing wall, a garbage can and the like, may exist in the image, and the appearance of the shielding objects in the point cloud is similar to the characteristics of the vehicle, which may cause the detection network to generate false detection, a three-dimensional detection method with multi-information fusion is used, and the prediction result is projected onto a two-dimensional image by using the calibration parameters of the camera, so as to obtain a two-dimensional frame value of the prediction result, i.e. a first two-dimensional frame value; performing border detection on the left view and the right view by using an object detection algorithm YOLOv5, and calculating to obtain a two-dimensional border value of the detection algorithm, namely a second two-dimensional border value; comparing the first two-dimensional frame value with the second two-dimensional frame value, and calculating to obtain an intersection ratio; and when the intersection ratio is larger than the threshold value, the prediction is considered to be correct, otherwise, the prediction result is discarded, the threshold value is set to be 0.5, and finally, the target prediction result is determined.
With reference to fig. 1, the embodiment of the present invention optimizes the training process of the binary neural network model by combining optimization parameters; compressing the channels by combining a binary data channel packaging technology and integrating the neural network operation by a network computing laminar flow technology to realize the optimization of the binary neural network to obtain the binary neural network; acquiring left and right views through a binocular camera, and inputting the left and right views into a binary neural network for feature extraction to obtain a disparity map; the method comprises the steps of setting a binocular camera to obtain camera calibration parameters, calculating coordinates of points in a disparity map by combining the camera calibration parameters, and storing calculated coordinate information into point cloud data to obtain a visual radar signal; discretizing the visual radar signal, projecting a processing result into a three-channel feature map, inputting the three-channel feature map into a network based on a depth residual error and a feature pyramid, and outputting a prediction result, wherein the prediction result comprises position coordinates (x, y, z), geometric coordinates (h, w, l) and a rotation angle ry around a y axis of an object.
Corresponding to the method in fig. 1, an embodiment of the present invention further provides a 3D object rapid detection apparatus, including:
the system comprises a first module, a second module and a third module, wherein the first module is used for acquiring left and right views and camera calibration parameters, the left and right views are acquired through a binocular camera, and the camera calibration parameters are used for representing parameters set by the binocular camera; (ii) a
The second module is used for processing the database picture and the real distance depth map through a neural network framework and training to obtain a binary neural network model, wherein the real distance depth map is obtained through a laser radar;
the third module is used for determining a binary neural network according to the binary neural network model by combining a binary data channel packaging technology and a network computing laminar flow technology;
the fourth module is used for inputting the left view and the right view into the binary neural network, extracting the features of the left view and the right view and determining a disparity map;
the fifth module is used for constructing point cloud data coordinate points according to the disparity map and the camera calibration parameters and determining a visual radar signal;
the sixth module is used for projecting the vision radar signal to obtain a three-channel characteristic diagram, inputting the three-channel characteristic diagram into a neural network and determining a prediction result;
and the seventh module is used for detecting the blocking object of the prediction result and determining a target prediction result.
Corresponding to the method of fig. 1, an embodiment of the present invention further provides an electronic device, which includes a processor and a memory; the memory is used for storing programs; the processor executes the program to implement the method as described above.
Corresponding to the method of fig. 1, the embodiment of the present invention further provides a computer-readable storage medium, which stores a program, and the program is executed by a processor to implement the method as described above.
The embodiment of the invention also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor to cause the computer device to perform the method illustrated in fig. 1.
In summary, compared with the related art, the embodiment of the invention has the following advantages:
1) in the related art, the laser radar data is used, but the cost of laser radar equipment for acquiring the laser radar data is too high, but the embodiment of the invention can be realized only by images acquired by one binocular camera, so that the use cost is reduced;
2) in the related technology, a large number of networks are used, which results in low operation speed, and the embodiment of the invention optimizes the binary neural network to a certain extent, thereby improving the processing speed;
3) in the related art, the laser radar is difficult to obtain the environment color signal, and the embodiment of the invention can reliably obtain the environment color signal at high speed, thereby improving the accuracy of object detection.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (8)
1. A3D object rapid detection method is characterized by comprising the following steps:
acquiring left and right views and camera calibration parameters; the left view and the right view are obtained through a binocular camera, and the camera calibration parameters are used for representing parameters set by the binocular camera;
processing the database picture and the real distance depth map through a neural network frame, and training to obtain a binary neural network model; the real distance depth map is obtained through a laser radar;
adding a spatial constraint term to the binary neural network model, specifically:
in the training set of the binary neural network model, randomly selecting a plurality of groups of point sets containing three points;
the point set satisfies the following conditions:
wherein,represents PAAnd PBConnecting line and PAAnd PCThe angle of the connecting line is the angle of the connecting line,represents PmAnd PkTwo-point Euclidean distance, PA、PB、PCThree points in the point set respectively;
establishing a spatial plane by three points in each group of point sets according to the point sets;
according to the space plane, determining the space constraint item and adding the space constraint item to the binary neural network model; the spatial constraint term is:
wherein loss is a space constraint term, N is the number of point sets,to calculate the resulting spatial plane using the real disparity map,to predict the spatial plane calculated using the net disparity map, | | | | | luminancel1Represents a 1 norm operation, i represents a positive integer;
determining a binary neural network according to the binary neural network model by combining a binary data channel packaging technology and a network computing laminar flow technology;
inputting the left view and the right view into the binary neural network, extracting features of the left view and the right view, and determining a disparity map;
constructing point cloud data coordinate points according to the disparity map and the camera calibration parameters, and determining a visual radar signal; the visual radar signal is used for representing point cloud data obtained by construction;
projecting the vision radar signal to obtain a three-channel characteristic diagram, inputting the three-channel characteristic diagram into a neural network, and determining a prediction result;
performing obstruction detection on the prediction result to determine a target prediction result;
the inputting the left view and the right view into the binary neural network, performing feature extraction on the left view and the right view, and determining a disparity map comprises:
inputting the left view and the right view into the binary neural network, extracting the features of the left view and the right view, and determining the image features;
determining an initial image by combining a parallax prediction algorithm according to the image characteristics;
wherein, the disparity prediction CostdesThe algorithm is as follows:
where Cost is the Cost of parallax in computation through the network, and Cost+Is the Cost of the latter disparity, Cost-Is the cost of the previous disparity;
and performing Gaussian filtering processing on the initial image to determine the disparity map.
2. The method for rapidly detecting the 3D object according to claim 1, wherein the determining the binary neural network according to the binary neural network model by combining a binary data channel packing technology and a network computing laminar flow technology comprises:
compressing binary data in the binary neural network model by using channel dimensions according to the binary data channel packing technology to determine an initial binary neural network;
according to the network computing laminar flow technology, combining the neural network operations in the initial binary neural network to determine the binary neural network; the neural network operation comprises a convolution operation, a batch normalization operation and a data binarization operation.
3. The method for rapidly detecting the 3D object according to claim 1, wherein the constructing point cloud data coordinate points according to the disparity map and the camera calibration parameters and determining the visual radar signal comprise:
calculating a coordinate point depth value of the disparity map according to the disparity map and the camera calibration parameters;
initializing point cloud data, storing the coordinate point depth value into the point cloud data, and determining first point cloud data;
calculating the coordinate point of the first point cloud data according to the coordinate point depth value and the camera calibration parameter, and determining second point cloud data;
and converting the left view and the right view into gray maps, merging the gray maps into the second point cloud data, and determining the visual radar signal.
4. The method according to claim 1, wherein the projecting the visual radar signal to obtain a three-channel signature, inputting the three-channel signature into a neural network, and determining a prediction result comprises:
determining an interested area by taking the binocular camera as a reference;
discretizing the vision radar signal in the region of interest to determine a discretized vision radar signal;
sequencing the discretized vision radar signals according to the height coordinates to determine the height of the point cloud;
performing density calculation on the discretization vision radar signal to determine the density of the point cloud;
determining the gray value of the discretization vision radar signal as point cloud intensity;
storing the point cloud height, the point cloud density and the point cloud intensity into a three-channel feature map;
inputting the three-channel characteristic diagram into the neural network, and determining an output result;
carrying out normalization processing on the output result, and then summing to determine a normalization processing result;
and carrying out threshold processing on the normalization result to determine the prediction result.
5. The 3D object rapid detection method according to claim 1, wherein the performing the occlusion detection on the prediction result and determining the target prediction result comprises:
projecting the prediction result onto a two-dimensional image according to the camera calibration parameters to determine a first two-dimensional frame value;
detecting the left view and the right view through an object detection algorithm, and determining a second two-dimensional frame value;
comparing the first two-dimensional frame value with the second two-dimensional frame value, and calculating to obtain an intersection ratio; and when the intersection ratio is larger than a threshold value, determining the target prediction result.
6. A3D object rapid detection device, comprising:
the system comprises a first module, a second module and a third module, wherein the first module is used for acquiring left and right views and camera calibration parameters, the left and right views are acquired through a binocular camera, and the camera calibration parameters are used for representing parameters set by the binocular camera;
the second module is used for processing the database picture and the real distance depth map through a neural network framework and training to obtain a binary neural network model, wherein the real distance depth map is obtained through a laser radar;
adding a spatial constraint term to the binary neural network model, specifically:
randomly selecting a plurality of groups of point sets containing three points in the training set of the binary neural network model;
the point set satisfies the following conditions:
wherein,represents PAAnd PBConnecting line and PAAnd PCThe angle of the connecting line is the angle of the connecting line,represents PmAnd PkTwo-point Euclidean distance, PA、PB、PCThree points in the point set respectively;
establishing a spatial plane by means of three points in each set of point sets according to the point sets;
according to the space plane, determining the space constraint item and adding the space constraint item to the binary neural network model;
the spatial constraint term is:
wherein loss is a space constraint term, N is the number of point sets,to compute the resulting spatial plane using the true disparity map,to predict the spatial plane calculated using the net disparity map, | | | | | luminancel1Represents a 1 norm operation, i represents a positive integer;
the third module is used for determining a binary neural network according to the binary neural network model by combining a binary data channel packaging technology and a network computing laminar flow technology;
the fourth module is used for inputting the left view and the right view into the binary neural network, extracting the features of the left view and the right view and determining a disparity map;
the inputting the left view and the right view into the binary neural network, performing feature extraction on the left view and the right view, and determining a disparity map comprises:
inputting the left view and the right view into the binary neural network, extracting the features of the left view and the right view, and determining the image features;
determining an initial image by combining a parallax prediction algorithm according to the image characteristics;
wherein, the disparity prediction CostdesThe algorithm is as follows:
where Cost is the Cost of parallax in computation through the network, and Cost+Is the Cost of the latter disparity, Cost-Is the cost of the previous disparity;
performing Gaussian filtering processing on the initial image to determine the disparity map;
the fifth module is used for constructing point cloud data coordinate points according to the disparity map and the camera calibration parameters and determining a visual radar signal;
the sixth module is used for projecting the vision radar signal to obtain a three-channel characteristic diagram, inputting the three-channel characteristic diagram into a neural network and determining a prediction result;
and the seventh module is used for detecting the blocking object of the prediction result and determining a target prediction result.
7. An electronic device comprising a processor and a memory;
the memory is used for storing programs;
the processor executing the program implementing the method of any one of claims 1-5.
8. A computer-readable storage medium, characterized in that the storage medium stores a program, which is executed by a processor to implement the method according to any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110553663.6A CN113281779B (en) | 2021-05-20 | 2021-05-20 | 3D object rapid detection method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110553663.6A CN113281779B (en) | 2021-05-20 | 2021-05-20 | 3D object rapid detection method, device, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113281779A CN113281779A (en) | 2021-08-20 |
CN113281779B true CN113281779B (en) | 2022-07-12 |
Family
ID=77280479
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110553663.6A Active CN113281779B (en) | 2021-05-20 | 2021-05-20 | 3D object rapid detection method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113281779B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114359891B (en) * | 2021-12-08 | 2024-05-28 | 华南理工大学 | Three-dimensional vehicle detection method, system, device and medium |
CN115619740B (en) * | 2022-10-19 | 2023-08-08 | 广西交科集团有限公司 | High-precision video speed measuring method, system, electronic equipment and storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110738241A (en) * | 2019-09-24 | 2020-01-31 | 中山大学 | binocular stereo vision matching method based on neural network and operation frame thereof |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10503966B1 (en) * | 2018-10-11 | 2019-12-10 | Tindei Network Technology (Shanghai) Co., Ltd. | Binocular pedestrian detection system having dual-stream deep learning neural network and the methods of using the same |
CN110148179A (en) * | 2019-04-19 | 2019-08-20 | 北京地平线机器人技术研发有限公司 | A kind of training is used to estimate the neural net model method, device and medium of image parallactic figure |
CN111028285A (en) * | 2019-12-03 | 2020-04-17 | 浙江大学 | Depth estimation method based on binocular vision and laser radar fusion |
CN111179330A (en) * | 2019-12-27 | 2020-05-19 | 福建(泉州)哈工大工程技术研究院 | Binocular vision scene depth estimation method based on convolutional neural network |
CN111444811B (en) * | 2020-03-23 | 2023-04-28 | 复旦大学 | Three-dimensional point cloud target detection method |
CN112633324A (en) * | 2020-11-27 | 2021-04-09 | 中山大学 | System, method and medium for matching stereoscopic vision around the eyes based on neural network |
CN112233163B (en) * | 2020-12-14 | 2021-03-30 | 中山大学 | Depth estimation method and device for laser radar stereo camera fusion and medium thereof |
-
2021
- 2021-05-20 CN CN202110553663.6A patent/CN113281779B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110738241A (en) * | 2019-09-24 | 2020-01-31 | 中山大学 | binocular stereo vision matching method based on neural network and operation frame thereof |
Non-Patent Citations (3)
Title |
---|
"基于深度学习的三维人脸重建技术研究";伊进延;《中国优秀博硕士学位论文全文数据库(硕士) 医药卫生科技辑》;20200115(第1期);22-23 * |
"基于深度学习的三维目标检测方法研究";王刚 等;《计算机应用与软件》;20201231;第37卷(第12期);164-168 * |
"基于车载双目相机的目标检测及其运动状态估计";刘奕博;《中国优秀博硕士学位论文全文数据库(硕士) 工程科技II辑》;20210215(第2期);9-10、32-35、47 * |
Also Published As
Publication number | Publication date |
---|---|
CN113281779A (en) | 2021-08-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Riegler et al. | Octnetfusion: Learning depth fusion from data | |
EP3822910A1 (en) | Depth image generation method and device | |
US11941831B2 (en) | Depth estimation | |
CN112132213A (en) | Sample image processing method and device, electronic equipment and storage medium | |
CN113052109A (en) | 3D target detection system and 3D target detection method thereof | |
CN110879994A (en) | Three-dimensional visual inspection detection method, system and device based on shape attention mechanism | |
CN113281779B (en) | 3D object rapid detection method, device, equipment and medium | |
Kumari et al. | A survey on stereo matching techniques for 3D vision in image processing | |
WO2021044122A1 (en) | Scene representation using image processing | |
Merras et al. | Multi-view 3D reconstruction and modeling of the unknown 3D scenes using genetic algorithms | |
US9361412B1 (en) | Method for the simulation of LADAR sensor range data | |
CN117422884A (en) | Three-dimensional target detection method, system, electronic equipment and storage medium | |
CN114372523A (en) | Binocular matching uncertainty estimation method based on evidence deep learning | |
CN114494589A (en) | Three-dimensional reconstruction method, three-dimensional reconstruction device, electronic equipment and computer-readable storage medium | |
CN116310219A (en) | Three-dimensional foot shape generation method based on conditional diffusion model | |
CN112562001A (en) | Object 6D pose estimation method, device, equipment and medium | |
CN115240168A (en) | Perception result obtaining method and device, computer equipment and storage medium | |
CN113160416A (en) | Speckle imaging device and method for coal flow detection | |
CN116452748A (en) | Implicit three-dimensional reconstruction method, system, storage medium and terminal based on differential volume rendering | |
EP4152274A1 (en) | System and method for predicting an occupancy probability of a point in an environment, and training method thereof | |
Zhang et al. | Object measurement in real underwater environments using improved stereo matching with semantic segmentation | |
Tao et al. | SiLVR: Scalable Lidar-Visual Reconstruction with Neural Radiance Fields for Robotic Inspection | |
CN117367404A (en) | Visual positioning mapping method and system based on SLAM (sequential localization and mapping) in dynamic scene | |
US11790642B2 (en) | Method for determining a type and a state of an object of interest | |
US20230281877A1 (en) | Systems and methods for 3d point cloud densification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |