CN116524128A

CN116524128A - Multi-view three-dimensional reconstruction and flexible docking method and system based on vision and force sense

Info

Publication number: CN116524128A
Application number: CN202310503256.3A
Authority: CN
Inventors: 彭刚; 关尚宾
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2023-04-28
Filing date: 2023-04-28
Publication date: 2023-08-01

Abstract

The invention discloses a multi-view three-dimensional reconstruction and flexible docking method and system based on vision and force sense, wherein the method comprises the following steps: acquiring a first RGB image and a first depth image of the mechanical arm at an initial position, detecting the position of the interface at the first RGB image, and calculating the interface coordinates by combining the first depth image; the mechanical arm moves to the vicinity of the coordinates of the butt joint, a second RGB image and a second depth image of the mechanical arm at the position are acquired from a plurality of view angles, the position of the butt joint in the second RGB image of each view angle is detected, the region of interest under each view angle is obtained by expanding a plurality of times, the region of interest is mapped to the second depth image corresponding to the second RGB image of the view angle and then three-dimensional reconstruction is carried out, a plane of the butt joint is extracted from the reconstructed point cloud, and the butt joint pose is generated; and the mechanical arm moves to the docking position, the contact force in the docking process is monitored, and when the contact force is not in the set range, the position and the posture of the tail end of the mechanical arm are adjusted, so that flexible docking is realized. The invention can improve the success rate of the butt joint task and the flexibility of the mechanical arm.

Description

Multi-view three-dimensional reconstruction and flexible docking method and system based on vision and force sense

Technical Field

The invention belongs to the technical field of automatic butt joint, and particularly relates to a multi-view three-dimensional reconstruction and flexible butt joint method and system based on vision and force sense.

Background

The charging port docking is used as a main means for automatically charging a new energy automobile by a robot, and in the docking of a fixed position, the automobile is required to be parked at an accurate position, so that the automatic charging is realized. However, this method has a high requirement for parking accuracy, and thus has poor adaptivity.

When positioning is realized visually, since the charging port in the RGB image is not a black regular pattern, and the features of the area near the charging port are not obvious, it is difficult to divide the charging port pixels directly by color. In addition, when the vision sensor collects the charging port image, due to illumination, camera performance and other reasons, the problem of partial region depth information deletion may exist, and the integrity of the depth information influences the accuracy of the docking pose, if key depth information is deleted, the docking failure is likely to be caused; the depth information collected by the vision sensor contains a plurality of surrounding invalid information besides the charging port, and a plurality of invalid operations are generated during algorithm processing. After the manipulator end effector reaches the docking pose obtained visually, the camera is too close to the surface of the charging port at the moment, pose deviation cannot be perceived visually, and if the manipulator end effector is docked according to an originally planned path, the pose deviation can cause extrusion of the manipulator end and the charging port, and even cause hardware damage; if the control arm returns and re-acquires the docking pose visually, it takes a lot of time.

Therefore, the prior art has the technical problems of lack of depth information, deviation of the docking pose obtained by vision and low docking success rate.

Disclosure of Invention

Aiming at the defects or improvement demands of the prior art, the invention provides a multi-view three-dimensional reconstruction and flexible docking method and system based on vision and force sense, which solve the technical problems of depth information deficiency, deviation of docking pose obtained by vision and low docking success rate in the prior art.

To achieve the above object, according to one aspect of the present invention, there is provided a multi-view three-dimensional reconstruction and flexible docking method based on vision and force sense, comprising the steps of:

(1) Acquiring a first RGB image and a first depth image of the mechanical arm at an initial position, detecting the position of the interface in the first RGB image, and calculating the coordinate of the interface under a world coordinate system by combining the depth value of the center of the interface in the first depth image;

(2) The method comprises the steps of moving a mechanical arm from an initial position to a position which is a preset value and is in a coordinate with a world coordinate system of a butt joint, collecting a second RGB image and a second depth image of the mechanical arm at the position from a plurality of view angles, detecting the position of the butt joint in the second RGB image under each view angle, expanding the detection result under each view angle by multiple times to obtain a region of interest under each view angle, mapping the region of interest under each view angle to the second depth image under the same view angle, performing three-dimensional reconstruction to obtain a reconstruction point cloud, extracting a butt joint plane in the reconstruction point cloud, and generating a butt joint pose;

(3) And the mechanical arm moves to the docking position, the contact force in the docking process is monitored, and when the contact force is not in the set range, the position and the posture of the tail end of the mechanical arm are adjusted, so that flexible docking is realized.

Further, in the step (2), the number of the second RGB images of the mechanical arm at the position is N from N view angles, the number of the second depth images of the mechanical arm at the position is N from N view angles, N is greater than or equal to 2, and the images are acquired from multiple view angles, so that the characteristic missing caused by light pollution can be avoided.

Further, the method for detecting the position of the interface in the second RGB diagram under each view angle uses a target detection method based on deep learning, specifically:

inputting the second RGB image under each view angle into a position detection model to obtain the position of the interface in the second RGB image under each view angle;

the position detection model comprises an input layer, a backbone neural network, a neck neural network and a head neural network, and is trained by the following modes:

collecting RGB sample images of different scenes and different visual angles to form a training set, and marking the positions of interfaces in the RGB sample images;

and inputting the training set into a position detection model, performing data enhancement processing on the RGB sample image by an input layer, extracting features of the image output by the input layer by a backbone neural network, fusing the features extracted by the backbone neural network by a neck neural network, outputting the predicted position of the opposite interface by the head neural network by utilizing the fused features, training the position detection model by taking the minimum error between the predicted position of the opposite interface and the marked position of the opposite interface as a target, and training until convergence to obtain a trained position detection model.

Further, the method for detecting the position of the interface in the second RGB diagram under each view angle based on HSV color space segmentation specifically includes:

acquiring an interface image sample, acquiring an HSV space value of the interface, converting a second RGB image from an RGB color space to an HSV color space under each view angle, and calculating a minimum circumscribed rectangle of the HSV space value area of the interface in the second RGB image to obtain the position of the interface.

Further, the backbone network comprises a CBS module, an ELAN module and an MP module, wherein the CBS module comprises a convolution layer, a normalization layer and an activation function, the ELAN module comprises a plurality of CBS modules, and the MP module comprises a maximum pooling module and a plurality of CBS modules; the neck neural network comprises a CBS module, an MP module, an ELAN-W, an up-sampling module, a Cat module and an SPPCSPC module, wherein the ELAN-W module and the ELAN module are different in selected output quantity, the Cat module is used for feature fusion, the SPPCSPC module is used for obtaining a larger receptive field through maximum pooling, and the head neural network comprises a residual structure.

Further, the detection pair interface uses a method based on HSV color space segmentation or a target detection method based on deep learning at the position of the first RGB image.

Further, the three-dimensional reconstruction comprises the steps of:

(21) A depth map set Image formed by the second depth map under each view angle, a Region-of-interest set Region formed by the Region-of-interest under each view angle, a camera Pose set Pose used for collecting a plurality of view angles and a coordinate P of the interface under a world coordinate system _w (x _w ，y _w ，z _w ) As an input to the modified TSDF algorithm;

(22) With P _w Establishing a cuboid bounding box comprising an opposite interface for the center, wherein the length, the width and the height of the cuboid bounding box are L, W, H respectively, and dividing the cuboid bounding box into a plurality of voxels;

(23) Traversing all voxels v, v corresponding to point P in world coordinate system _v P is obtained _v Coordinates P in camera coordinate system _c And providing camera internal parameters to calculate a second depth image at the ith view angle _i Pixel point P in (3) _d ；

(24) Judging P _d Whether or not to locate at image _i Is of interest R _d If not, the sign distance function value sdf (P _v ) 1, returning to the step (23), and if yes, entering the step (25);

(25)P _d point at image _i The depth value of (d) is val (d), P _c At a distance dis (c) from the origin of the camera coordinate system, sdf (P) _v ) =val (d) -dis (c), and the point P is obtained by setting the cutoff distance to u by the following equation _v Cut-off sign distance function value tsdf (P _v )：

(26) Projection ray and P defining origin of camera _v The included angle of the surface is theta, and the point P is calculated _v Is w (P) _v ) =cos (θ)/dis (c), repeating steps (23) to (26) until all voxel traversals are completed;

(27) Tsdf (P) at the current viewing angle _v )、w(P _v ) And a global truncated symbol distance function value TSDF (P _v ) And global weight W (P _v ) Fusion, TSDF (P) _v ) And W (P) _v ) All initialized to 0:

(28) And switching to the depth map, the region of interest and the camera pose of the next view angle until the depth map set, the region of interest set and the camera pose set are traversed.

Further, the step (3) includes:

the mechanical arm moves to a docking position, and the contact force in the docking process is monitored;

when the butt joint of the tail end of the mechanical arm firstly enters the butt joint interface, controlling the butt joint of the tail end of the mechanical arm to carry out initial probing along the axis of the butt joint pose z, carrying out secondary probing when the contact force is smaller than or equal to the initial probing maximum contact force, adjusting the translation amount of the tail end of the mechanical arm when the contact force is larger than the initial probing maximum contact force, entering the step (2) when the contact force is larger than the initial probing maximum contact force after adjustment, and carrying out secondary probing when the contact force is smaller than or equal to the initial probing maximum contact force after adjustment;

after the initial test is abnormal, continuing to perform secondary test along the axial direction of the butt joint position z of the tail end of the mechanical arm, if the contact force during the secondary test is smaller than or equal to the secondary test maximum contact force, using the butt joint of the accurate position corresponding to the contact force, if the contact force during the secondary test is larger than the secondary test maximum contact force, adjusting the tail end rotation amount of the mechanical arm, if the contact force after the adjustment is larger than the secondary test maximum contact force, entering the step (2), and if the contact force after the adjustment is smaller than or equal to the secondary test maximum contact force, using the butt joint of the accurate position corresponding to the contact force;

When accurate pose docking is utilized, if the contact force is larger than the maximum docking contact force, the step (2) is entered, if the stepping error between the actual stepping displacement and the given stepping displacement of the docking position of the tail end of the mechanical arm is larger than a threshold value, the stepping value of the docking position of the tail end of the mechanical arm is adjusted, and when the contact force is within a set range, the stepping error is smaller than or equal to the threshold value and the tail end of the mechanical arm is communicated with the docking port, the docking is successful.

Further, the mode of adjusting the pose of the tail end of the mechanical arm is as follows:

when the contact force is not in the set range, the pose of the tail end of the mechanical arm is adjusted through the following actions,

ε＝max(1-p×t，0)

wherein t represents the number of failed adjustment, p is an adjustable super parameter, epsilon is a dynamic adjustment value, the priority adjustment pool is a preset action, random generation action is randomly represented, rand (1) represents generation of a random number, and the action is the translation amount of the tail end of the mechanical arm or the rotation amount of the tail end of the mechanical arm.

Further, the butt joint mouth is the mouth that charges, then the arm end is the head that charges, and the butt joint mouth is the water injection mouth, then the arm end is the water adding pipeline export, and the butt joint mouth is the oil filler, then the arm end is the oil adding pipeline export, the butt joint mouth is the gas injection mouth, then the arm end is the gas adding pipeline export.

According to another aspect of the present invention, there is provided a multi-view three-dimensional reconstruction and flexible docking system based on vision and force sense, comprising: the visual positioning module comprises a primary positioning module and a secondary positioning module;

the primary positioning module is used for acquiring a first RGB image and a first depth image of the mechanical arm at an initial position, detecting the position of the interface in the first RGB image, and calculating the coordinate of the interface under a world coordinate system by combining the depth value of the center of the interface in the first depth image;

the secondary positioning module is used for enabling the distance from the initial position of the mechanical arm to the coordinate of the butt joint in the world coordinate system to be a preset value, collecting a second RGB image and a second depth image of the mechanical arm at the position from a plurality of view angles, detecting the position of the butt joint in the second RGB image under each view angle, expanding the detection result under each view angle by multiple times to obtain a region of interest under each view angle, mapping the region of interest under each view angle to the second depth image under the same view angle, then carrying out three-dimensional reconstruction to obtain a reconstruction point cloud, extracting a butt joint plane in the reconstruction point cloud, and generating a butt joint pose;

the flexible docking module is used for moving the mechanical arm to a docking pose, monitoring the contact force in the docking process, and adjusting the pose of the tail end of the mechanical arm when the contact force is not in a set range so as to realize flexible docking.

According to another aspect of the invention, the application of a visual and force sense based multi-view three-dimensional reconstruction and flexible docking method is provided, wherein the method is a flexible docking method, and the method is applied to charging, oiling, water adding and air adding.

In general, the above technical solutions conceived by the present invention, compared with the prior art, enable the following beneficial effects to be obtained:

(1) The visual positioning is divided into a primary positioning stage and a secondary positioning stage, wherein the position of the interface in the RGB image is detected during the primary positioning stage, and then the coordinate of the interface under the world coordinate system is primarily calculated by utilizing the depth information of the center of the interface; during secondary positioning, RGB images and depth images are acquired from multiple view angles, feature loss caused by light pollution can be avoided, after the position of an interface in the RGB images is detected, a detection result is expanded into an interested region and mapped into the depth images, and then the interested region is subjected to local three-dimensional reconstruction, so that the problem of depth information loss during direct acquisition of point clouds is solved, redundant point clouds of non-interested regions are filtered, and the calculation efficiency is improved. In the butt joint process, the contact force in the butt joint process is monitored, and when the contact force is abnormal, the pose of the tail end of the mechanical arm is adjusted, so that flexible butt joint is realized. The multi-view three-dimensional reconstruction and flexible docking method based on vision and force sense can acquire complete point cloud and accurate pose of a docking interface, improve success rate of docking tasks and flexibility of the mechanical arm, and ensure safety of the docking process.

(2) The invention provides a plurality of methods for detecting positions of interfaces in RGB images. The pixel position of the interface in the RGB image is detected by the target detection method of deep learning, so that the problems of unobvious interface characteristics and large manual design characteristic operation amount are solved, and the adaptability to different environments is good. The method based on HSV color space segmentation has good detection effect when the color difference between the scene pair interface and the nearby area is obvious.

(3) Although the depth camera can obtain the point cloud data through the SDK calculation, the point cloud obtained by the SDK is all the point clouds collected by the whole view angle and comprises a lot of useless information, and if the point clouds of multiple view angles are directly processed, serious waste of calculation resources is caused, so that the real-time performance is poor. In a real environment, influence factors such as illumination exist, so that the integrity of the acquired depth information is reduced, and the accuracy of pose acquisition is influenced. According to the method, three-dimensional reconstruction is carried out on the region of interest, and the complete point cloud of the charging port is obtained. The method solves the problem of partial depth information missing when the depth camera collects images, and solves the problem of low instantaneity caused by overlarge calculation amount of using global point cloud. The three-dimensional reconstruction algorithm of the region of interest mainly performs pruning improvement in the step (24), for the TSDF algorithm, after a cuboid bounding box L, W, H is designated, TSDF values of all voxels in the bounding box are calculated, and the reconstruction result comprises redundant point clouds of a non-region of interest besides point clouds of the region of interest, so that a plane obtained during plane segmentation of a charging port is an ineffective plane of the non-region of interest possibly; for the three-dimensional reconstruction algorithm of the region of interest, most of point clouds of non-region of interest are filtered due to pruning improvement, the point clouds of the region of interest obtained through reconstruction remain the complete point clouds of the charging port, invalid calculation of non-key points is reduced, and the plane of the charging port is segmented by utilizing the point clouds of the region of interest.

(4) When the flexible docking is performed, the collision protection is performed by detecting the contact force, so that the problems that equipment is extruded and damaged in the docking process due to the deviation of docking positions are prevented. And correcting the pose of the tail end of the mechanical arm by fine adjustment, so as to correct the butt joint pose and enable the tail end of the mechanical arm to accurately enter the butt joint interface. In the butt joint process, the accurate butt joint pose is obtained through primary trial and secondary trial, and position and force detection is used as a standard for judging success of the stepping process when the accurate pose is used for butt joint, so that equipment damage is avoided, and the butt joint success rate is improved.

(5) When the contact force is abnormal through probing, the dependence degree of the flexible butt joint on the priority adjustment pool is reduced along with the increase of the fine adjustment failure times, and the accurate butt joint pose is more prone to be found by adopting a random exploration mode. Compared with a random adjustment mode, the mode based on epsilon-greedy is used when the butt joint pose is corrected in fine adjustment, and the mode utilizes a successful adjustment scheme when correction is successful, so that fine adjustment efficiency is improved.

(6) The invention has various application scenes, can be applied to automatic charging of new energy automobiles, can be used for automatically butting a water filling pipeline for vehicles with water filling ports so as to realize automatic water filling, can be used for automatically butting a fueling interface for automobiles, airplanes and rail transit which need fueling so as to realize automatic fueling, and can be used for automatically butting a fueling port for equipment which needs fueling so as to realize automatic fueling.

Drawings

FIG. 1 is a flow chart of a multi-view three-dimensional reconstruction and flexible docking method based on vision and force sense provided by an embodiment of the invention;

FIG. 2 is a functional schematic of a preliminary positioning module according to an embodiment of the present invention;

FIG. 3 is a functional schematic of a secondary positioning module according to an embodiment of the present invention;

FIG. 4 is a flow chart of three-dimensional reconstruction of a region of interest provided by an embodiment of the present invention;

fig. 5 (a) is a three-dimensional reconstruction effect diagram of the TSDF provided by the embodiment of the present invention;

fig. 5 (b) is a three-dimensional reconstruction effect diagram of a region of interest according to an embodiment of the present invention;

FIG. 6 is a flow chart of a flexible docking provided by an embodiment of the present invention;

fig. 7 is a flow chart of flexible fine tuning provided by an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

As shown in fig. 1, a multi-view three-dimensional reconstruction and flexible docking method based on vision and force sense comprises the following steps:

And during secondary positioning, the distance from the initial position of the mechanical arm to the coordinate of the butt joint under the world coordinate system is S, the left and right coordinates of the position are used as secondary positioning starting points, and during motion planning between the secondary positioning starting points, a second RGB image and a second depth image of the mechanical arm at the position are acquired from multiple view angles.

Example 1

The invention is described in detail by taking intelligent charging of a new energy automobile as an example.

The charging port of the new energy automobile is not completely vertical to the ground, but is connected with the groundThe surfaces having an included angleIn the docking task of embodiment 1 of the present invention, < >>

In order to describe the docking task of the charging port of the new energy automobile more clearly, a coordinate system is established for the charging port of the new energy and the pose of the charging port is described, and the plane of the upper surface of the charging port is defined as a plane _α The connection surface between the charging port and the bracket is defined as a plane _β The method comprises the following steps:

plane _α ||plane _β

defining a charging port center point P (x, y, z) as an origin, and a charging port surface normal vectorIs oriented in the y-axis, perpendicular to the y-axis and parallel to the ground plane _playground The vector of (2) is the x-axis, and the direction of the z-axis is obtained according to the right-hand rule, namely:

for the charging port docking task, a docking pose needs to be generated for docking of the charging head. The three-dimensional coordinates of the docking pose are charging port center points P (x, y, z), and the docking pose is described as:

wherein x, y and z are the coordinates of the center point P of the charging port, and ω=0 in the docking scene, namely the upper and lower edge directions of the charging port are shownPlane parallel to ground _playground 。

The charging port docking problem is translated by the definition above as follows: taking the acquired RGB image and depth image as input, and calculating the charging port Pose Pose by a charging port Pose estimation method _charge And the docking pose is converted into a world coordinate system, and a position foundation is provided for docking planning.

For a charging head with depth l=3.5 cm, the standard for successful docking is:

1) From charging port center point P (x, y, z) along charging port plane normalThe direction enters L, and the error is smaller than 1mm;

2) The distance between the actual charging head target center point Q (x ', y ', z ') and the charging port center point P (x, y, z) in the x, y, z directions is smaller than epsilon _max Wherein ε is _max =2mm, i.e.:

3) Because the vision can not judge the docking state when the charging head reaches the plane pose of the charging port, namely the error of the docking pose can not provide information of the vision sensor for knowing, a force sensor is introduced for judging. I.e. the 3 rd condition of successful butt joint is that the force in the normal direction is F _min And F _max Between, wherein F _min ＝60N，F _max =80n, i.e.:

F _min ≤|F _z |≤F _max

a multi-view three-dimensional reconstruction and flexible docking system based on vision and force sense comprises a vision positioning module and a flexible docking module, wherein the vision positioning module comprises a primary positioning module and a secondary positioning module.

As shown in fig. 2, the preliminary positioning module is configured to collect an RGB image and a depth image (visualized as a gray view) of the mechanical arm at an initial position as input, detect a position of the charging port in the RGB image, and preliminarily calculate a coordinate of the charging port in a world coordinate system in combination with a depth of the depth image at a center point.

As shown in fig. 3, the input of the secondary positioning module is an RGB image and a depth map acquired by multiple viewing angles, firstly, the pixel position of a charging port in the RGB image is detected through deep learning, and in order to avoid that the RGB target detection does not completely contain the charging port, the detection result is expanded by N times to obtain a region of interest; then mapping the region of interest into a depth map, wherein the region of interest comprises a charging port and a small part of the area of a nearby plane; then carrying out three-dimensional reconstruction on the depth map region of the region of interest; then, separating the fitted plane of the charging port point cloud to obtain a charging port plane equation; finally, estimating the charging port Pose Pose through a charging port plane equation and a central point _charge And generating the docking pose.

The primary positioning module and the secondary positioning module are used for detecting the pixel positions of the charging ports in the RGB image, and the two-dimensional target detection can be based on a traditional image method or a deep learning method. Wherein conventional image methods may use HSV color space segmentation based methods. The core idea is that the HSV space value of the charging port is obtained by collecting an image sample, and then the minimum circumscribed rectangle of the HSV space value is calculated to obtain the largest object (namely the charging port) of the RGB image in the selected HSV interval.

The target detection method based on the HSV color space has good detection effect when the color difference between the charging port of the experimental scene and the color of the nearby area is obvious, and the area with the largest area is the charging port, but in the actual scene, the HSV space with the color of the charging port and the nearby area and the nearby scene thereof are many, such as black shielding objects, black automobile bodies and the like, and the method is greatly influenced by illumination and exposure parameters, so the method has poor effect, and the largest plane obtained by segmentation is not the charging port.

It can be seen that the charging port cannot be simply divided by color, but the charging port needs to be detected by the features, and specifically, the charging port can be divided into a target detection algorithm based on the traditional features and based on deep learning. In the target detection algorithm based on the traditional characteristics, characteristic points such as SIFT, haar and the like are generally extracted from a specific region of an image, and then the characteristic points are classified through a machine learning algorithm such as a Support Vector Machine (SVM) and the like, so that the target detection is realized, and the algorithm needs to manually adjust parameters in a large amount, has high time complexity and is difficult to adapt to complex scenes.

The deep learning-based object detection method can be divided into two types, one-stage and two-stage, which represent YOLO series and R-CNN series, respectively, according to whether RP (Region pro) is used or not, i.e., a candidate box that can contain an object. The method of the first stage does not need the process of generating the candidate region, the model is simple, the detection speed is high, but the detection accuracy is less than that of a two-stage network; in contrast, the two-stage network sacrifices a certain real-time performance for candidate region generation, thereby trading for higher detection accuracy. Because the docking task needs higher real-time performance, in the secondary positioning of the charging port, the accurate docking pose is finally generated by relying on depth information, but the charging port detection of the RGB image only plays an auxiliary role in the whole process, and a certain accuracy error can be tolerated in the RGB target detection stage, the charging port is detected by adopting a one-stage target detection algorithm, namely the YOLO system is the latest improved version YOLOv7 at present.

YOLOv7 has a precision and speed in the range of 5FPS (frames per second) to 160FPS that is higher than most known object detectors, mostly inherited from YOLOv5, with a precision higher than YOLOv5, and a speed of about 120% faster.

The YOLOv7 network structure can be divided into 4 main parts: input, backbone neural network, neck neural network and Head neural network, wherein the Input end is responsible for preprocessing of Input images, data enhancement and other operations; the backbone network is responsible for extracting the characteristics of the processed image; the neck network is responsible for fusing the characteristics to obtain 3 characteristics of large, medium and small; the head is a prediction end, and the fused features are utilized to output a prediction result. The backbone network mainly comprises CBS, ELAN, MP total 3 modules, wherein the CBS module is abbreviation of Conv (convolutional layer), BN (Batchnormalization, normalization layer) and Silu (an activation function) and is mainly used for changing the number of channels, extracting features and downsampling; the ELAN module is composed of a plurality of CBS modules, which can control the shortest and longest distances of the gradient path,thereby enabling the network to learn more features efficiently; the MP module consists of MaxPool (max pooling) and a number of CBS modules, which are mainly used for downsampling. In the cervical neural network, besides the CBS module and the MP module which are the same as the backbone neural network, the cervical neural network also comprises an ELAN-W, upSampling, cat module and an SPPCSPC module, wherein the ELAN-W module is quite similar to the ELAN module in the backbone neural network, only the selected output quantity is different, and the functions of the cervical neural network are also that the network learning characteristics are more efficient; upSampling is an UpSampling module; the Cat module is mainly used for feature fusion; the SPPCSPC module is used for obtaining a larger receptive field through maximum pooling, so that the SPPCSPC module can be better suitable for images with different resolutions. The head neural network only comprises a REPConv module, and has certain difference in training and prediction, and the idea is derived from RepVGG ^[9] A special residual structure is designed to assist training. The residual structure is uniquely designed, and the complex residual structure is equivalent to a 3×3 convolution in prediction, so that the complexity of the network is reduced without losing the prediction performance of the network.

Compared to other algorithms of the YOLO series, network optimization of YOLOv7 is mainly in the following aspects:

1) In the feature extraction, the proposed multi-branch stacking structure and downsampling structure are used, and compared with YOLOv5, the jump connection structure of the model is more dense.

2) CSP is introduced into the SPP structure to form an SPPCSPC module, so that the receptive field is enlarged to adapt to images with different resolutions;

3) RepVGG-based structure introduces RepConv, which reduces the parameter number of the network;

4) The YOLOv7 introduces a self-adaptive multi-positive sample matching mechanism, in the training process, each real frame (group trunk) can be Predicted through a plurality of prior frames (Anchor trunk), a Predicted frame (Predicted trunk) is obtained after the prior frames are adjusted, IOU and types are calculated, and the most suitable prior frame is found. This mechanism accelerates the training efficiency of the model.

When a charging port data set is manufactured, firstly, a mechanical arm needs to be controlled to move to different poses, RGB images acquired from different angles are acquired, a plurality of groups of white, red and black backgrounds are set for simulating vehicles with different colors in an actual scene, and different illumination conditions are set for simulating the interference of the environment. In addition, in order to embody the application of the algorithm in the actual scene, the charging port images of the new energy automobile in the actual scene are added into the data set for training besides the charging port acquired in the laboratory.

When a data set is produced, an open source tool labelImg is required to be adopted for marking, a rectangular frame is marked at a charging port in an image, and information such as a marking type, coordinates and the like is stored in an XML file to be used as a label.

Because the target detection used in the invention is supervised learning, a large number of data sets are needed to train, the charging port data set detected in the invention is needed to continuously control the mechanical arm to change the pose thereof to collect when in manufacture, and the generated image is needed to manufacture a label, so that the efficiency is low, and the image and the label of the data set are necessary to be expanded. The expansion modes include rotation, mirroring, scaling, shearing, blurring and the like.

In the preliminary positioning module, firstly, the position of a charging port in an RGB image is obtained, and a central pixel point P of a detection result is obtained _o (u, v) as coordinate points of the primary pose, and acquiring depth information corresponding to the pixel point through the registered depth map. In order to avoid the condition of depth information missing, the central pixel point is expanded into an area of 9 pixel points, and the average value of the depth information of the 9 pixel points is solved(the pixel is ignored if depth information is missing), and z= =>Substituting and calculating coordinate P of charging port in camera coordinate system _c (x _c ，y _c ，z _c ) The camera external parameters are converted into the world coordinate system to obtain the coordinate P of the charging port _w (x _w ，y _w ，z _w ) Image acquisition and interest for secondary localizationThe three-dimensional reconstruction of the region provides a location basis.

In the secondary positioning module, after inputting an RGB image and a depth map acquired by multiple visual angles, firstly acquiring the pixel position of a charging port in the RGB image, and acquiring the accurate pose of the charging port through the following 3 steps:

(1) Charging port area point cloud segmentation

When detecting the charging port position in the RGB diagram acquired from multiple views, the rectangular frame of the detection result may not completely include the charging port. In order to prevent the key part of the charging port from being missed when the RGB detection result guides the point cloud segmentation, the rectangular frame of the detection result needs to be expanded to N times, N=1.2 is taken, and the region is called as a region of interest R _o 。

Since the RGB map and depth map are already aligned, the region of interest can be mapped into the depth map and the mapped region defined as the depth map region of interest R _d R is then _d Contains both full depth information of the charging port and a small amount of depth information of the charging port connection panel, and the area of the region of interest is about N of the charging port area ² =1.44 times.

(2) Charging port point cloud completion based on three-dimensional reconstruction of region of interest

Although the Intel Realsense D435i depth camera can obtain point cloud data through SDK calculation, the point cloud obtained by the SDK is all the point clouds collected by the whole view angle, and comprises a lot of useless information, only the interested area needs to be processed and analyzed in the task of the chapter, if the point clouds of multiple view angles are directly processed, serious waste of calculation resources is caused, and the real-time performance is poor; in addition, in the real environment of new energy automobile charging, there may be influence factors such as illumination, and the depth information integrity that leads to its acquisition reduces to influence the precision that the pose obtained.

And aiming at the problems, carrying out three-dimensional reconstruction on the region of interest to obtain the complete point cloud of the charging port. However, the common three-dimensional reconstruction method KinectFusion is a large-scale reconstruction method, and the area of the region of interest facing the present invention is relatively small, so the present invention uses an improved TSDF (Truncated Signed Distance Function, truncated symbol distance function) algorithm. The basic idea of the TSDF algorithm is to divide a three-dimensional space into a series of voxels (voxels), each voxel being used to record its distance information from the object surface; when each depth map is input, calculating the distance value between the voxel and the depth image, carrying out weighted fusion with the previous voxel, and updating the information in the voxel according to the pixel value in the current depth image and the distance value in the existing voxel, thereby obtaining a relatively accurate three-dimensional model.

The invention provides a three-dimensional reconstruction method of an area of interest on the basis of a TSDF algorithm. Firstly, defining a world coordinate system, an end coordinate system and a camera coordinate system as O respectively _w {x，y，z}、O _e { x, y, z } and O _c { x, y, z }, because the pose of the camera under the world coordinate system needs to be acquired during reconstruction, the matrix is transformed uniformlyDescribing the pose of the camera pose under a world coordinate system, and defining a depth map set image= { Image acquired by n visual angles ₁ ，image ₂ ，...，image _n Region of interest set region= { R ₁ ，R ₂ ，...，R _n Phase Pose set Pose= { phase } is acquired ₁ ，pose ₂ ，...，pose _n }, i.e.)>Wherein the element types in Region are R (x, y, w, h), x and y represent the top left corner vertex pixel coordinates of the Region of interest, and w and h represent the length and width of the Region of interest, respectively. The flow of the three-dimensional reconstruction algorithm of the region of interest is shown in fig. 4.

According to fig. 4, the algorithm flow for the three-dimensional reconstruction of interest is as follows:

1) Charging port center coordinates P obtained by integrating depth map Image, region of interest set Region, camera Pose set Pose and preliminary positioning _w (x _w ，y _w ，z _w ) As an improved TSDF algorithmInputting;

2) With P _w A cuboid bounding box is established for the center, the length, the width and the height of the cuboid bounding box are L, W, H respectively, the cuboid bounding box contains a charging port to be rebuilt in space, the cuboid bounding box is divided into a plurality of voxels, and each voxel is of side length L _v Obviously, the rectangular solid is divided intoA voxel;

3) Traversing all voxels v, v corresponding to point P in world coordinate system _v Can be respectively through transformation relationFind P _v Coordinates P in camera coordinate system _c And providing camera internal parameters to calculate the image _i Pixel point P in (3) _d ；

4) Judging P _d Whether or not to locate in depth map image _i Is of interest R _d In, if not, proving that the point of the voxel v in the space is not part of the charge port point cloud, directly converting the SDF value (Signed Distance Function, sign distance function) SDF (P _v ) Setting the value to be 1, returning to the step 3), otherwise, continuing to process;

5) From P _d Point at image _i The depth value of (d) is val (d), P _c The distance from the origin of the camera coordinate system is dis (c) and is determined as sdf (P) _v ) =val (d) -dis (c), and the point P can be obtained by setting the cutoff distance to u by the following equation _v Truncated symbol distance function value:

6) Projection ray and P defining origin of camera _v The included angle of the surface is theta, and the point P can be obtained _v Is w (P) _v ) =cos (θ)/dis (c), repeating steps 3) to 6) until all voxel traversals are completed;

7) Tsdf (P) of current frame _v )、w(P _v ) And global TSDF (P _v ) And W (P) _v ) Fusion, the fusion mode is shown in the following formula, wherein the global TSDF (P _v ) And W (P) _v ) All initialized to 0:

8) If the data of the plurality of views is not traversed, switching to the depth image of the next view _i+1 Region of interest R _i+1 And collecting pose position _i+1 Until all the image and pose data are traversed.

From the above steps, the three-dimensional reconstruction algorithm of the region of interest mainly performs pruning improvement on the TSDF in the step 4). For the TSDF algorithm, as shown in (a) of fig. 5, after a cuboid bounding box L, W, H is specified, TSDF values of all voxels in the bounding box are calculated, and a reconstruction result includes redundant point clouds of a non-interest region in addition to point clouds of an interest region, so that a plane obtained when a charging port plane is segmented may be an ineffective plane of the non-interest region; for the three-dimensional reconstruction algorithm of the region of interest, as shown in (b) of fig. 5, most of non-region-of-interest point clouds are filtered, the reconstructed region-of-interest point clouds retain the complete point clouds of the charging port, invalid calculation of non-key points is reduced, and the plane of the charging port is segmented by using the region-of-interest point clouds.

(3) Charging port pose acquisition

After the key point cloud of the charging port is completed through three-dimensional reconstruction of the region of interest, the problem of large calculation amount is caused by excessive number of point clouds, so downsampling is needed.

In the docking task, after the detection result is obtained in the RGB detection stage, only the point cloud of the region of interest with the detection result extended by N times is taken for processing, and the area of the region of interest is N of the detection result ² =1.44 times, thus the area in the point cloud is maximumThe plane of the camera is a charging port plane, so that the maximum plane can be fitted By a RANSAC method to be used as the charging port plane, the equation of the fitting plane alpha and alpha under a camera coordinate system is ax+By+Cz+D=0, and the normal vector isWhen the butt joint task is executed, the butt joint pose is vertical to the surface of the charging port inwards, and is different from the 4-DOF pose of the grabbing task which is always vertical to the downward, and the calculated fitting plane normal vector +.>The direction is uncertain, the normal vector which can be solved is the outward direction of the vertical fitting plane alpha or the inward direction of the vertical fitting plane alpha, and the opposite direction is the butt joint pose. The charging port in the image collected under the camera coordinate system should be consistent with the positive direction of the z-axis of the camera coordinate system, if C > 0, the normal vector is represented +.>Is consistent with the direction of the z-axis, and is stillIf C<0, then represent normal vector->The normal vector of the fitting plane alpha is corrected to be opposite to the z-axis direction

The charging port point cloud P separated by RANSAC method can calculate the center coordinates of the charging port, assuming that P is composed of n points, where the coordinates of each point are P _i (x _i ，y _i ，z _i ) And for any i ε [1, n ]]Has P _i E P, the center coordinate P of the charging port plane can be calculated by _o (x _o ，y _o ，z _o )：

By means of the calculation of the number of the parameters,and P _o (x _o ，y _o ，z _o ) Forms charging port pose in camera coordinate system>Converting camera external parameters into a world coordinate system and normalizing normal vectors to obtain pose +.>Wherein->The normal vector normalized by the plane of the charging port.

However, in ROS, only the surface normal vector cannot be input as a pose, and a manner of translation vector+euler angle or translation vector+quaternion is required, and thus the normal vector needs to be converted. The vector to Euler angle usually determines two direction vectors, but only normal vector is currently usedIn one direction, the calculation result has uncertainty. In the docking task, the upper and lower edge direction of the charging port is defined>Always parallel to the ground, expressed as +.>Thus its pose can be calculated by means of a vector product, assuming first of all +.>Is +.>By->And->Determination of the direction by means of a vector productWherein->Is->Then pass->And->To find the exact +.>Namely: />

Results of the above calculationThe vector may calculate a rotation matrix as:

finally, the rotation matrix is converted into Euler angles (phi, theta, phi) and the center coordinates (x) _w ，y _w ，z _w ) Constitute 6-DOF docking Pose Pose _d (x _w ，y _w ，z _w ，φ，θ，ψ)。

As shown in fig. 6, the flexible docking module firstly obtains the docking pose under the world coordinate system through vision, then performs track planning through a movetit function pack, and controls the charging head at the tail end of the mechanical arm to move to the docking pose. When the charging head enters the charging port for the first time, firstly controlling the mechanical arm to perform initial probing along the axis direction of the butt joint pose z, if the contact force |F is at the moment _z |>F _fmax (wherein F) _fmax Is the initial test to search the maximum contact force and F _fmax =30n), the deviation of the docking pose obtained visually is indicated, at this time, the translational amount is adjusted by the flexible adjustment module, if the adjustment fails, the mechanical arm stops planning and re-collects and calculates the docking pose, otherwise, the fine adjustment of the translational amount is indicated to be abnormal, and the docking can be continued. After no abnormality in the initial test, continuing to perform a second test along the axis of the butt joint pose z, and if the second test still finds that the contact force is abnormal (i.e., |F _z |>F _smax ，F _smax Maximum contact force is secondarily probed and F _smax =50n), the rotation amount is adjusted by the flexible adjustment module, if the adjustment fails, the mechanical arm stops planning and obtains the docking pose again through vision, and if the adjustment succeeds, the docking of the accurate pose is started. In docking with accurate pose, the charging head steps by delta each time, position and force detection is designed as a criterion for judging the success of the stepping process, when the force sensor detects the force |F in the z direction _z |>F _max When the number is 80N, the abnormal docking is shown, and the risk of damaging equipment exists, so that planning is directly stopped, the mechanical arm is controlled to retract to a vision acquisition image, and the docking pose is acquired again; in addition, the pose of the tail end in the current state is calculated through the positive kinematics of the mechanical arm, so that the actual stepping displacement deltaz is calculated, and if the absolute value delta-deltaz is |>Sigma, if the error between the actual stepping displacement deltaz and the given stepping displacement delta is larger, the stepping value needs to be readjusted to correct, so that the charging head steps by a distance delta-deltaz along the normal direction, the displacement of each stepping movement meets the expectation, and the holding error is smaller than sigma=0.1 mm. Due to single passThe stepping error is controlled within sigma=0.1 mm, the stepping times in the invention are N=7, the maximum error sigma' =N.sigma=0.7 mm in the whole process is within the allowable range [0, sigma ] of the butt joint error _max ]And (3) inner part. The contact force should satisfy 60N < F when the butt joint is successful _z An index of 80N or less, and furthermore, a final criterion of success or failure of docking is that a change in charging power is detected if the charging power P>5KW, the docking is successful.

When the initial trial or the secondary trial detects the contact force abnormality, the visual acquisition of the translational deviation epsilon of the docking pose is indicated _t >ε _tmax =1 mm or rotational offset ε _r ＞ε _rmax =1°, but in most cases the docking pose deviations do not differ much, only with a slight adjustment. If the vision acquisition pose is returned and the docking pose is recalculated, the efficiency of the docking process is low, and therefore, the docking pose needs to be adjusted by using a flexible fine adjustment module, as shown in fig. 7, when the contact force is abnormal by probing, the mechanical arm returns to the probing initial pose to start fine adjustment, and the fine adjustment action is acquired by the following formula:

after the corrected pose is obtained, controlling the mechanical arm to reach the pose and move by a linear motion delta along the z axis of the adjusted butt joint pose, if the contact force |F in the z direction after the movement _z |<F _max If the fine adjustment is effective, the butt joint can be continued; otherwise, the fine tuning is not valid. Since the docking task does not allow an unlimited fine tuning of the robotic arm, an upper limit C of the number of fine tuning is set, the present invention sets c=150. If the fine tuning times exceed C, the butt joint pose deviation generated by the vision at the time cannot be corrected by fine tuning, and the pose needs to be acquired by the vision again; if the upper limit has not been reached, the fine-tuning strategy continues to be performed. The invention sets up dynamic epsilon:

ε＝max(1-p×t，0)

wherein t represents the number of fine tuning failures, p is an adjustable super parameter, and the larger p represents the lower the credibility of the priority adjustment pool, and the invention sets p=0.005. Therefore, after the super parameter p is set, the flexible docking model can reduce the dependence degree on the priority adjustment pool along with the increase of the fine adjustment failure times, and the accurate docking pose is more prone to be found by adopting a random exploration mode.

The invention can be used for automatically charging a new energy automobile in the embodiment 1, automatically docking a water filling pipeline for a vehicle with a water filling port so as to realize automatic water filling, and automatically docking a fueling interface for automobiles, airplanes and rail transit which need fueling so as to realize automatic fueling. And can also be used for automatically aerating equipment needing aerating. The system of the present invention may also be used if a particular liquid injection is required on the aircraft.

Example 2

In order to quantify the accuracy of the butt joint pose obtained by the method of the chapter based on deep learning and local three-dimensional reconstruction, comparison experiments are carried out on 3 experimental methods, wherein the 3 experimental methods are respectively as follows: SFIT feature extraction, PNP pose estimation, deep learning detection, point cloud generated by SDK, deep learning detection and local three-dimensional reconstruction point cloud, wherein the overall flow of each method is to acquire the butt joint pose through inputting RGB images and depth information. When the comparison experiment is carried out, each method is provided with 6 experimental scenes of white background (car body), black background (car body), red background (car body) and blue background (car body), strong light and weak light for comparison, and each group of experiments is repeated 5 times. The main indexes of the study in the comparison experiment are the pose translation error and the time consumed for generating the pose, and the experimental results are shown in table 1.

Table 1 visual positioning test results of charging port

/>

From the experimental results of table 1, the analysis was made as follows: 1) Using SFIT feature extraction+PNP bitsThe method for acquiring the butt joint pose by pose estimation cannot ensure that the average translational error epsilon is less than or equal to epsilon _max The butt joint pose calculated under the black background and different illumination has larger error and cannot adapt to all scenes; 2) In the method for acquiring the point cloud by using deep learning and SDK, the average error of a scene in a black background is reduced by 68.7%, the average error of the scene in strong light and weak light is respectively reduced by 43.6% and 84.0%, the average error of the whole is reduced by 61.7%, and the target detection algorithm used in the method has better real-time property and can acquire the point cloud from the SDK of a camera more quickly, so the time for acquiring the butt joint pose is reduced by 26%, but the butt joint pose error caused by the lack of the point cloud of a charging port is still quite large under a strong light scene; 3) After the charging port near point cloud is complemented by the method of introducing the three-dimensional reconstruction of the region of interest, although certain instantaneity is sacrificed, the average error of the scene of strong light is reduced by 59.1%, the overall average error is reduced by 30.4%, and the problem of overlarge pose error in the scene of strong light is solved to a certain extent.

Example 3

In the flexible butt joint experiment, the chapter performs comparative analysis on 3 butt joint methods without fine tuning, random fine tuning and flexible fine tuning based on epsilon-greedy. Since the docking pose obtained by vision does not have the error epsilon > epsilon every time _max And there is a certain difference in each error, it is necessary to control the variable in such a manner as to control the magnitude of the error. The invention mainly researches the adjustment of translation errors in the process of designing experiments, and adds 3 groups of artificially set error poses (namely, adds offset on the basis of accurate poses) besides the poses acquired through vision, and the total of 4 pose types are set for comparison experiments by each method, and the specific design is as follows:

a) Type 1: the butt joint pose obtained by the chapter visual positioning method;

b) Type 2: setting x direction +1.8mm as error (average translational error maximum);

c) Type 3: the y-direction setting-2.8 mm was set as the error (maximum translational error).

d) Type 4: while setting x-2.8 mm and y-direction +2.6mm as errors (combined case).

Each set of comparative experiments was repeated 20 times, the number of fine tuning C was limited to 150 times, and the experimental results are shown in table 2.

Table 2 results of flexible docking experiments at the charging port

* And (3) injection: -the average docking time is an invalid indicator when the docking success rate is 0%.

From the experimental results of table 2, the analysis was made as follows: 1) The method without fine tuning has 45% success rate in the pose type 1 obtained by vision, and has shorter average butt-joint time because the pose is not adjusted, the butt-joint effect completely depends on the precision of the vision positioning algorithm, and the artificial setting is larger than epsilon _max In the pose types 2-4 of the error, the pose of the deviation cannot be adjusted, so that the charging port cannot be accessed, and the docking success rate is 0; 2) The random fine tuning method improves the docking success rate of the pose type 1 obtained by vision by 55.0 percent, and the error is larger than epsilon _max The fine adjustment is performed under the condition that the successful butt joint is broken through from scratch, but certain real-time performance is sacrificed for fine adjustment, and higher fine adjustment times are needed; 3) Compared with a method without fine tuning, the flexible fine tuning method based on epsilon-greedy sacrifices a certain docking time to make the error larger than epsilon _max Compared with a random fine tuning method, the method utilizes the experience of successful fine tuning, the average fine tuning frequency is reduced by 66.2%, the average butt joint success rate is improved by 6.25%, and the average butt joint time is reduced by 59.1%. Experimental results show that the flexible fine adjustment method based on epsilon-greedy can improve the success rate and robustness of the mechanical arm in the real environment, but a certain time is needed for fine adjustment.

It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A multi-view three-dimensional reconstruction and flexible butt joint method based on vision and force sense is characterized by comprising the following steps:

2. The method for three-dimensional reconstruction and flexible docking of multiple view angles based on vision and force sense according to claim 1, wherein the method for detecting the position of the interface in the second RGB diagram under each view angle uses a target detection method based on deep learning, specifically:

3. The method for three-dimensional reconstruction and flexible docking of multiple view angles based on vision and force sense according to claim 1, wherein the method for detecting the position of the interface in the second RGB diagram under each view angle uses HSV color space segmentation, specifically:

4. A multi-view three-dimensional reconstruction and flexible docking method based on vision and force sense as claimed in any one of claims 1 to 3, wherein said detection pair interface uses an HSV color space segmentation-based method or a deep learning-based object detection method at the position of the first RGB image.

5. A multi-view three-dimensional reconstruction and flexible docking method based on vision and force sense as claimed in any one of claims 1-3, wherein said three-dimensional reconstruction comprises the steps of:

6. A multi-view three-dimensional reconstruction and flexible docking method based on vision and force sense according to any one of claims 1-3, wherein said step (3) comprises:

7. A multi-view three-dimensional reconstruction and flexible docking method based on vision and force sense according to any one of claims 1-3, wherein the means for adjusting the pose of the end of the mechanical arm is:

when the contact force is not in the set range, the pose of the tail end of the mechanical arm is adjusted through the following actions:

ε＝max(1-p×t，0)

8. A multi-view three-dimensional reconstruction and flexible docking method based on vision and force sense according to any one of claims 1-3, wherein the docking port is a charging port, the tail end of the mechanical arm is a charging head, the docking port is a water injection port, the tail end of the mechanical arm is a water adding pipeline outlet, the docking port is a fuel adding port, the tail end of the mechanical arm is a fuel adding pipeline outlet, the docking port is a gas injection port, and the tail end of the mechanical arm is a gas adding pipeline outlet.

9. A multi-view three-dimensional reconstruction and flexible docking system based on vision and force sense, comprising: the visual positioning module comprises a primary positioning module and a secondary positioning module;

10. The application of the multi-view three-dimensional reconstruction and flexible docking method based on vision and force sense is characterized in that the method is the flexible docking method according to any one of claims 1-8, the method is applied to charging, oiling, water adding and air adding, when the method is applied to charging, a charging head at the tail end of a mechanical arm is docked with a charging port, when the method is applied to oiling, an oil adding pipeline outlet at the tail end of the mechanical arm is docked with an oil adding port, when the method is applied to water adding, an oil adding pipeline outlet at the tail end of the mechanical arm is docked with an water adding port, and when the method is applied to air adding, an air adding pipeline outlet at the tail end of the mechanical arm is docked with an air injecting port.