CN113281779B

CN113281779B - 3D object rapid detection method, device, equipment and medium

Info

Publication number: CN113281779B
Application number: CN202110553663.6A
Authority: CN
Inventors: 陈刚; 孟海涛; 李昌财
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2021-05-20
Filing date: 2021-05-20
Publication date: 2022-07-12
Anticipated expiration: 2041-05-20
Also published as: CN113281779A

Abstract

The invention discloses a method, a device, equipment and a medium for rapidly detecting a 3D object, wherein the method comprises the following steps: acquiring left and right views and camera calibration parameters; processing the database picture and the real distance depth map through a neural network frame, and training to obtain a binary neural network model; determining a binary neural network according to a binary neural network model by combining a binary data channel packaging technology and a network computing laminar flow technology; inputting the left view and the right view into a binary neural network, extracting the characteristics of the left view and the right view, and determining a disparity map; constructing point cloud data coordinate points by combining camera calibration parameters according to the disparity map, and determining a visual radar signal; discretizing the visual radar signal, inputting the discretized visual radar signal into a network based on the depth residual error and the feature pyramid, and determining a prediction result. The invention reduces the cost of three-dimensional object detection, improves the speed and accuracy of detection, and can be widely applied to the technical field of three-dimensional object detection.

Description

3D object rapid detection method, device, equipment and medium

Technical Field

The invention relates to the technical field of three-dimensional object detection, in particular to a method, a device, equipment and a medium for rapidly detecting a 3D object.

Background

In recent years, with the development of the automatic driving technology, the status of the laser radar has become more and more important. The laser radar has the advantages that the distance including objects can be directly measured, an automatic driving target detection algorithm can be developed, and the positions and the advancing directions of different targets can be accurately estimated in three-dimensional target detection.

Nowadays, a technology for obtaining geometric coordinates and position coordinates of surrounding objects by processing point cloud signal data generated by a laser radar is well developed. Such as Frustum PointNets, etc. It shows very high accuracy on testing of the KITTI data set. However, it also requires pre-processing of the calibrated camera image before processing the point cloud data.

The following disadvantages of such a design can be seen: the accuracy of the model depends largely on the camera image and the associated convolutional neural network; the whole process has more neural networks and more complex models, which can cause too high delay and low efficiency.

Most of the existing three-dimensional detection technologies rely on laser radar, but the hardware cost of the laser radar is high. For example, HDL-64E lidar manufactured by Velodyne corporation, USA, has a domestic selling price of over fifty ten thousand yuan. In addition, the point cloud data obtained by the laser radar is sparse, which may result in a target with a complex appearance, or some small objects cannot be reflected in the point cloud. Moreover, the laser radar can only provide sparse measurement points, cannot provide image and color information, and is difficult to perform further task development on the basis of the sparse measurement points. In addition, the existing three-dimensional detection networks are slow, and have high requirements on hardware storage, calculation and energy consumption. The three-dimensional detection network has high calculation amount requirements, and is difficult to be deployed in practical application, so that the requirement on real-time performance cannot be met.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method, an apparatus, a device, and a medium for rapidly detecting a 3D object, so as to reduce the use cost of a three-dimensional detection technology and improve the real-time performance of a three-dimensional detection network.

In one aspect, the present invention provides a method for rapidly detecting a 3D object, including:

acquiring left and right views and camera calibration parameters; the left view and the right view are obtained through a binocular camera, and the camera mark parameters are used for representing parameters set by the binocular camera;

processing the database picture and the real distance depth map through a neural network frame, and training to obtain a binary neural network model; the real distance depth map is obtained through a laser radar;

determining a binary neural network according to the binary neural network model by combining a binary data channel packaging technology and a network computing laminar flow technology;

inputting the left view and the right view into the binary neural network, extracting features of the left view and the right view, and determining a disparity map;

constructing point cloud data coordinate points according to the disparity map and the camera calibration parameters, and determining a visual radar signal; the visual radar signal is used for representing point cloud data obtained by construction;

projecting the vision radar signal to obtain a three-channel characteristic diagram, inputting the three-channel characteristic diagram into a neural network, and determining a prediction result;

and detecting the blocking object of the prediction result to determine a target prediction result.

Preferably, adding a spatial constraint term to the binary neural network model comprises:

randomly selecting a plurality of groups of point sets containing three points in the training set of the binary neural network model;

the point set satisfies the following conditions:

in the formula,

is represented by P_AAnd P_BConnecting line and P_AAnd P_CThe angle of the connecting line is the angle of the connecting line,

represents P_mAnd P_kTwo-point Euclidean distance, P_A、P_B、P_CThree points in the point set respectively;

establishing a spatial plane by means of three points in each set of point sets according to the point sets;

according to the space plane, determining the space constraint item and adding the space constraint item to the binary neural network model;

the spatial constraint term is:

wherein, loss is a space constraint term, N is the number of point sets,

to calculate the resulting spatial plane using the real disparity map,

to predict the spatial plane calculated using the net disparity map, | | | | | luminance_l1Representing a 1 norm operation and i represents a positive integer.

Preferably, the determining a binary neural network according to the binary neural network model by combining a binary data channel packing technique and a network computing laminar flow technique includes:

compressing binary data in the binary neural network model by using channel dimensions according to the binary data channel packing technology to determine an initial binary neural network;

according to the network computing laminar flow technology, combining the neural network operations in the initial binary neural network to determine the binary neural network; the neural network operation comprises a convolution operation, a batch normalization operation and a data binarization operation.

Preferably, the inputting the left view and the right view into the binary neural network, performing feature extraction on the left view and the right view, and determining the disparity map includes:

inputting the left view and the right view into the binary neural network, extracting the features of the left view and the right view, and determining the image features;

determining an initial image by combining a parallax prediction algorithm according to the image characteristics;

and performing Gaussian filtering processing on the initial image to determine the disparity map.

Preferably, the constructing a point cloud data coordinate point according to the disparity map and the camera calibration parameters to determine a visual radar signal includes:

calculating a coordinate point depth value of the disparity map according to the disparity map and the camera calibration parameters;

initializing point cloud data, storing the coordinate point depth value into the point cloud data, and determining first point cloud data;

calculating the coordinate point of the first point cloud data according to the coordinate point depth value and the camera calibration parameter, and determining second point cloud data;

and converting the left view and the right view into gray maps, merging the gray maps into the second point cloud data, and determining the visual radar signal.

Preferably, the projecting the vision radar signal to obtain a three-channel characteristic diagram, inputting the three-channel characteristic diagram into a neural network, and determining a prediction result includes:

determining an interested area by taking the binocular camera as a reference;

discretizing the vision radar signal in the region of interest to determine a discretized vision radar signal;

sequencing the discretized vision radar signals according to the height coordinates to determine the height of the point cloud;

performing density calculation on the discretization vision radar signal to determine the density of the point cloud;

determining the gray value of the discretization vision radar signal as point cloud intensity;

storing the point cloud height, the point cloud density and the point cloud intensity into a three-channel feature map;

inputting the three-channel characteristic diagram into the neural network, and determining an output result;

carrying out normalization processing on the output result, and then summing to determine a normalization processing result;

and carrying out threshold processing on the normalization result to determine the prediction result.

Preferably, the detecting the blocking object to the prediction result and determining the target prediction result includes:

projecting the prediction result onto a two-dimensional image according to the camera calibration parameters to determine a first two-dimensional frame value;

detecting the left view and the right view through an object detection algorithm, and determining a second two-dimensional frame value;

comparing the first two-dimensional frame value with the second two-dimensional frame value, and calculating to obtain an intersection ratio;

when the intersection ratio is greater than a threshold value, determining the target prediction result.

On the other hand, the embodiment of the invention also discloses a 3D object rapid detection device, which comprises:

the system comprises a first module, a second module and a third module, wherein the first module is used for acquiring left and right views and camera calibration parameters, the left and right views are acquired through a binocular camera, and the camera calibration parameters are used for representing parameters set by the binocular camera; (ii) a

The second module is used for processing the database picture and the real distance depth map through a neural network framework and training to obtain a binary neural network model, wherein the real distance depth map is obtained through a laser radar;

the third module is used for determining a binary neural network according to the binary neural network model by combining a binary data channel packaging technology and a network computing laminar flow technology;

the fourth module is used for inputting the left view and the right view into the binary neural network, extracting the features of the left view and the right view and determining a disparity map;

the fifth module is used for constructing a point cloud data coordinate point according to the disparity map and the camera calibration parameters and determining a visual radar signal;

the sixth module is used for projecting the vision radar signal to obtain a three-channel characteristic diagram, inputting the three-channel characteristic diagram into a neural network and determining a prediction result;

and the seventh module is used for detecting the blocking object of the prediction result and determining a target prediction result.

On the other hand, the embodiment of the invention also discloses an electronic device, which comprises a processor and a memory;

the memory is used for storing programs;

the processor executes the program to implement the method as described above.

In another aspect, an embodiment of the present invention further discloses a computer-readable storage medium, where the storage medium stores a program, and the program is executed by a processor to implement the method described above.

In another aspect, an embodiment of the present invention further discloses a computer program product or a computer program, where the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and the computer instructions executed by the processor cause the computer device to perform the foregoing method.

Compared with the prior art, the invention adopting the technical scheme has the following technical effects: the method comprises the steps of obtaining left and right views and camera calibration parameters; processing the database picture and the real distance depth map through a neural network frame, and training to obtain a binary neural network model; the accuracy of the binary neural network model can be improved; determining a binary neural network according to the binary neural network model by combining a binary data channel packaging technology and a network computing laminar flow technology; the storage space can be saved and the time loss can be reduced; inputting the left view and the right view into the binary neural network, extracting features of the left view and the right view, and determining a disparity map; constructing point cloud data coordinate points according to the disparity map and the camera calibration parameters, and determining a visual radar signal; the method can acquire the environmental point cloud information reliably at high speed and can also acquire environmental color information; projecting the vision radar signal to obtain a three-channel characteristic diagram, inputting the three-channel characteristic diagram into a neural network, and determining a prediction result; the equipment cost can be reduced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a detailed flow chart of an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.

The embodiment of the invention provides a 3D object rapid detection method, which comprises the following steps:

determining a binary neural network according to a binary neural network model by combining a binary data channel packaging technology and a network computing laminar flow technology;

inputting the left view and the right view into a binary neural network, extracting the characteristics of the left view and the right view, and determining a disparity map;

according to the disparity map, point cloud data coordinate points are constructed by combining camera calibration parameters, and visual radar signals are determined; the visual radar signal is used for representing point cloud data obtained by construction;

discretizing the vision radar signal, inputting the vision radar signal into a neural network, and determining a prediction result.

Further as a preferred embodiment, adding a spatial constraint term to the binary neural network model includes:

in the training set of the binary neural network model, randomly selecting a plurality of groups of point sets containing three points;

the point set satisfies the following conditions:

in the formula,

the spatial constraint term is:

wherein, loss is a space constraint term, N is the number of point sets,

to calculate the resulting spatial plane using the real disparity map,

According to one implementation mode of the embodiment of the invention, pictures provided in a KITTI2012 database and a real distance depth map obtained by using a radar are processed by using a torch7 neural network framework, and a positive and negative sample pair is constructed and obtained for training to obtain a binary neural network model; training of the binary neural network model is based on a floating point neural network training process, binaryzation is carried out on floating point weights in forward propagation, and results are calculated by using the weights after binaryzation processing; in the backward propagation, only the floating point weight is updated, and the updated floating point weight is used in the next forward propagation process; because the training parameter setting of the binary neural network model has a unique rule, when the binary neural network model is trained, the learning rate used for training the binary neural network model should be as low as possible so as to reduce the condition that the weight after binarization processing is unstable due to frequent symbol conversion of weight data caused by overlarge learning rate in the training process, and the learning rate used in the training process of the binary neural network model is 2 multiplied by 10^-4(ii) a In order to improve the expression capability of the binary neural network, the output of the convolution kernel after the binarization processing is made to approach the output of the full-precision convolution kernel as much as possible, and the optimization parameters are used for calculating the binary convolution kernelLine optimization calculation; the optimization calculation formula is as follows:

C_i≈αB_i

wherein, let the size of the network layer convolution layer be k × k, the number of input channels be m, the number of output channels be n, C_iFor the output of the ith channel of the floating-point convolution, B_iOutputting a channel i for the binary convolution, wherein both the channel i and the channel i have a structure of h multiplied by w multiplied by c, alpha is an optimization parameter, and the alpha optimization parameter is obtained through self-adaptive calculation of a convolution kernel to participate in calculation, so that a binary convolution kernel result is as close to a floating point convolution kernel result as possible; the optimization parameters are obtained through calculation of an optimization function; wherein the optimization function H (B, α) is:

H(B，α)＝||C_i–αB_i||²；

the calculation formula of the optimization parameters is as follows:

wherein, W_iIs the weight parameter of the ith output channel, | | | calting_l1Is a 1-norm operation.

Adding a space constraint item to the binary neural network model for training, and carrying out training on any point p (u) in the 2D image_i，v_i) Wherein u is_i，v_iRespectively representing the abscissa and the ordinate of the P points, i represents the number of the points, and the three-dimensional space mapping expression of the points is obtained through the calculation of camera parameters and is expressed as P (x)_i，y_i，z_i) (ii) a Wherein x is_i，y_i，z_iThree-dimensional coordinates respectively expressed as p points; in training set depth map data with accurate depth information, N groups of point sets containing 3 points are arbitrarily selected, wherein M is { (P)_A，P_B,P_C)_iI ═ 0,1 … N }, and any set of points needs to satisfy:

in the formula,

represents P_AAnd P_BConnecting line and P_AAnd P_CThe angle of the connecting line is the angle of the connecting line,

represents P_mAnd P_kTwo-point Euclidean distance, P_A、P_B、P_CThree points in the point set are respectively;

according to the above conditions, a spatial plane is established by three points in each set of points:

according to the space plane, in the training process of the binary neural network model, adding a space constraint item:

wherein loss is a space constraint term, N is the number of point sets,

to calculate the resulting spatial plane using the real disparity map,

As a further preferred embodiment, determining a binary neural network according to a binary neural network model in combination with a binary data channel packing technique and a network computing laminar flow technique includes:

compressing binary data in a binary neural network model by using channel dimensions according to a binary data channel packing technology, and determining an initial binary neural network;

according to the network computing laminar flow technology, combining the neural network operations in the initial binary neural network model to determine a binary neural network; the neural network operation comprises convolution operation, batch normalization operation and data binarization operation.

In order to increase the storage density and the calculation density of binary data, the binary data is compressed by using channel dimensions through a binary data channel packaging technology; in the channel compression process, compressing the channel number to be integral multiple of 64, for the condition that the channel number cannot be divided by 64, filling the data to be integral multiple of 64, taking the channel dimension as the last dimension, and determining an initial binary neural network; in the initial binary neural network, both the weight and the data are quantized to { +1, -1}, but on the hardware level, 1 is represented as +1 and 0 is represented as-1, in this embodiment, the channel part which is less than the integral multiple of 64 is filled with 0, and the filled 0 brings-1 additionally, and for the extra data, the calculation is performed by using the binary dot product operation, and the calculation formula is as follows:

A·B＝-(2×popcnt(A^B)–vec_len)

wherein popcnt is the operation of counting the number of 1 in the sequence; vec _ len represents the effective bit length participating in the operation; a and B respectively represent two binary sequences; through the formula, the logical operation is used for replacing the multiplication operation, so that the operation speed is obviously improved; according to the network computing laminar flow technology, combining the neural network operation in the initial binary neural network model, and fusing the convolution operation, batch normalization operation and data binarization operation together; wherein, the laminar flow bnMap computational formula is:

wherein C represents the convolution operation result of the non-calculated offset parameter; thresh is a parameter determined by convolution layer bias b, scaling layer coefficient η, Batchnormalization layer scaling coefficient γ, translation parameter β, sample mean μ, and sample standard deviation σ; the calculation formula of thresh is:

further preferably, the inputting the left and right views into the binary neural network, performing feature extraction on the left and right views, and determining the disparity map includes:

Wherein, the disparity prediction Cost_desThe algorithm is as follows:

wherein Cost is the Cost of parallax in network calculation, and Cost⁺Is the Cost of the latter disparity, Cost^-Is the cost of the previous disparity;

and performing Gaussian filtering on the initial image by using Gaussian filtering with the filtering kernel size of 3 multiplied by 3 to obtain a smoother disparity map, and the method can be used for a more accurate 3-dimensional detection task.

Further as a preferred embodiment, the method for determining the visual radar signal by constructing a point cloud data coordinate point according to the disparity map and by combining the camera calibration parameters comprises the following steps:

calculating the depth value of the coordinate point of the disparity map according to the disparity map and the camera calibration parameter;

initializing point cloud data, storing coordinate point depth values into the point cloud data, and determining first point cloud data;

and converting the left view and the right view into gray maps, and combining the gray maps into the second point cloud data to determine the visual radar signals.

Calculating a coordinate point depth value of the disparity map according to the disparity map and camera calibration parameters; and (3) setting the coordinates of a point p in the disparity map as Y (u, v), wherein u and v are respectively the abscissa and the ordinate of the point p, and calculating the depth value D (u, v) of the point by a formula:

in the formula (f)_UObtaining a horizontal focal length parameter of a left camera in the obtained binocular camera, and b obtaining a horizontal offset parameter of the obtained binocular camera; initializing point cloud data, storing coordinate point depth values into the point cloud data, and determining first point cloud data; setting the dimensions of the disparity map as h × w, namely height and width, so that the storage dimensions of the point cloud are 4 × N, and N ═ h × w; the first two dimensions are respectively used for storing the position of the coordinate point, the third dimension is used for storing the depth of the coordinate point, and the fourth dimension is used for storing the reflection intensity of the coordinate point; filling the coordinate point depth information obtained after the calculation into the point cloud to obtain first point cloud data; calculating the coordinate points of the first point cloud data according to the coordinate point depth values and the camera calibration parameters, wherein the calculation formula is as follows:

in the formula, xyz is a space coordinate of the point cloud, x is a width, y is a height, and z is a depth; (c)_U,c_V) Is the center pixel position of the corresponding camera, f_VIs a vertical focal length(ii) a At the moment, the conversion of the three-dimensional 3 × N matrix information of the point cloud is completed, and second point cloud data are obtained; and converting the left view and the right view into a gray scale map through OpenCV, wherein the gray scale map is also in a format of h w, stretching the gray scale map into 1N, and combining the gray scale map into second point cloud data to obtain a visual radar signal.

Further, as a preferred embodiment, discretizing the visual radar signal, inputting the discretized visual radar signal into a neural network, and determining a prediction result, the method includes:

determining an interested area by taking the binocular camera as a reference;

Wherein, a square area with the left and right 25m and the front 50m is set as an interested area by taking a binocular camera as a reference; reserving the points in the partial area and discarding the points at other positions; discretizing the vision radar signals in the region of interest, sequencing the vision radar signals according to the height coordinate, discarding and counting points with the same horizontal coordinate, and projecting a sequencing result to a three-channel characteristic diagram; the three-channel characteristic diagram respectively stores the density, height and intensity of the point cloud data; the height map takes the maximum value of the sorting result, and the density map counts each timeThe number of points N in each grid is then taken as the density value

The intensity map is replaced with the gray values of the left and right views; and inputting the three-channel characteristic diagram into a network based on the depth residual error and the characteristic pyramid, and predicting the position coordinates (x, y, z), the geometric coordinates (h, w, l) and the rotation angle ry around the y axis of the object.

Further preferably, the detecting the blocking object to the prediction result and determining the target prediction result includes:

Because some shielding objects, such as a flower bed, an enclosing wall, a garbage can and the like, may exist in the image, and the appearance of the shielding objects in the point cloud is similar to the characteristics of the vehicle, which may cause the detection network to generate false detection, a three-dimensional detection method with multi-information fusion is used, and the prediction result is projected onto a two-dimensional image by using the calibration parameters of the camera, so as to obtain a two-dimensional frame value of the prediction result, i.e. a first two-dimensional frame value; performing border detection on the left view and the right view by using an object detection algorithm YOLOv5, and calculating to obtain a two-dimensional border value of the detection algorithm, namely a second two-dimensional border value; comparing the first two-dimensional frame value with the second two-dimensional frame value, and calculating to obtain an intersection ratio; and when the intersection ratio is larger than the threshold value, the prediction is considered to be correct, otherwise, the prediction result is discarded, the threshold value is set to be 0.5, and finally, the target prediction result is determined.

With reference to fig. 1, the embodiment of the present invention optimizes the training process of the binary neural network model by combining optimization parameters; compressing the channels by combining a binary data channel packaging technology and integrating the neural network operation by a network computing laminar flow technology to realize the optimization of the binary neural network to obtain the binary neural network; acquiring left and right views through a binocular camera, and inputting the left and right views into a binary neural network for feature extraction to obtain a disparity map; the method comprises the steps of setting a binocular camera to obtain camera calibration parameters, calculating coordinates of points in a disparity map by combining the camera calibration parameters, and storing calculated coordinate information into point cloud data to obtain a visual radar signal; discretizing the visual radar signal, projecting a processing result into a three-channel feature map, inputting the three-channel feature map into a network based on a depth residual error and a feature pyramid, and outputting a prediction result, wherein the prediction result comprises position coordinates (x, y, z), geometric coordinates (h, w, l) and a rotation angle ry around a y axis of an object.

Corresponding to the method in fig. 1, an embodiment of the present invention further provides a 3D object rapid detection apparatus, including:

the fifth module is used for constructing point cloud data coordinate points according to the disparity map and the camera calibration parameters and determining a visual radar signal;

Corresponding to the method of fig. 1, an embodiment of the present invention further provides an electronic device, which includes a processor and a memory; the memory is used for storing programs; the processor executes the program to implement the method as described above.

Corresponding to the method of fig. 1, the embodiment of the present invention further provides a computer-readable storage medium, which stores a program, and the program is executed by a processor to implement the method as described above.

The embodiment of the invention also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor to cause the computer device to perform the method illustrated in fig. 1.

In summary, compared with the related art, the embodiment of the invention has the following advantages:

1) in the related art, the laser radar data is used, but the cost of laser radar equipment for acquiring the laser radar data is too high, but the embodiment of the invention can be realized only by images acquired by one binocular camera, so that the use cost is reduced;

2) in the related technology, a large number of networks are used, which results in low operation speed, and the embodiment of the invention optimizes the binary neural network to a certain extent, thereby improving the processing speed;

3) in the related art, the laser radar is difficult to obtain the environment color signal, and the embodiment of the invention can reliably obtain the environment color signal at high speed, thereby improving the accuracy of object detection.

In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.

Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A3D object rapid detection method is characterized by comprising the following steps:

acquiring left and right views and camera calibration parameters; the left view and the right view are obtained through a binocular camera, and the camera calibration parameters are used for representing parameters set by the binocular camera;

adding a spatial constraint term to the binary neural network model, specifically:

the point set satisfies the following conditions:

wherein,

establishing a spatial plane by three points in each group of point sets according to the point sets;

according to the space plane, determining the space constraint item and adding the space constraint item to the binary neural network model; the spatial constraint term is:

wherein loss is a space constraint term, N is the number of point sets,

to calculate the resulting spatial plane using the real disparity map,

to predict the spatial plane calculated using the net disparity map, | | | | | luminance_l1Represents a 1 norm operation, i represents a positive integer;

performing obstruction detection on the prediction result to determine a target prediction result;

the inputting the left view and the right view into the binary neural network, performing feature extraction on the left view and the right view, and determining a disparity map comprises:

wherein, the disparity prediction Cost_desThe algorithm is as follows:

where Cost is the Cost of parallax in computation through the network, and Cost⁺Is the Cost of the latter disparity, Cost^-Is the cost of the previous disparity;

2. The method for rapidly detecting the 3D object according to claim 1, wherein the determining the binary neural network according to the binary neural network model by combining a binary data channel packing technology and a network computing laminar flow technology comprises:

3. The method for rapidly detecting the 3D object according to claim 1, wherein the constructing point cloud data coordinate points according to the disparity map and the camera calibration parameters and determining the visual radar signal comprise:

4. The method according to claim 1, wherein the projecting the visual radar signal to obtain a three-channel signature, inputting the three-channel signature into a neural network, and determining a prediction result comprises:

determining an interested area by taking the binocular camera as a reference;

5. The 3D object rapid detection method according to claim 1, wherein the performing the occlusion detection on the prediction result and determining the target prediction result comprises:

comparing the first two-dimensional frame value with the second two-dimensional frame value, and calculating to obtain an intersection ratio; and when the intersection ratio is larger than a threshold value, determining the target prediction result.

6. A3D object rapid detection device, comprising:

the system comprises a first module, a second module and a third module, wherein the first module is used for acquiring left and right views and camera calibration parameters, the left and right views are acquired through a binocular camera, and the camera calibration parameters are used for representing parameters set by the binocular camera;

the point set satisfies the following conditions:

wherein,

the spatial constraint term is:

wherein loss is a space constraint term, N is the number of point sets,

to compute the resulting spatial plane using the true disparity map,

wherein, the disparity prediction Cost_desThe algorithm is as follows:

performing Gaussian filtering processing on the initial image to determine the disparity map;

7. An electronic device comprising a processor and a memory;

the memory is used for storing programs;

the processor executing the program implementing the method of any one of claims 1-5.

8. A computer-readable storage medium, characterized in that the storage medium stores a program, which is executed by a processor to implement the method according to any one of claims 1-5.