CN116468781A

CN116468781A - Outdoor remote hierarchical visual positioning measurement method

Info

Publication number: CN116468781A
Application number: CN202310276988.3A
Authority: CN
Inventors: 邢傲然; 刘伟; 詹文博; 沈时武; 邬妲琳; 何明德; 袁康祥
Original assignee: Taizhou Research Institute Of South University Of Science And Technology; Taizhou Nanke Intelligent Sensing Technology Co ltd
Current assignee: Taizhou Research Institute Of South University Of Science And Technology; Taizhou Nanke Intelligent Sensing Technology Co ltd
Priority date: 2023-03-16
Filing date: 2023-03-16
Publication date: 2023-07-21

Abstract

The invention belongs to the technical field of visual positioning measurement, in particular to an outdoor remote hierarchical visual positioning measurement method, which comprises an operation platform with stacked materials, wherein a displacement device is arranged on the operation platform, a coarse positioning visual positioning system, a fine positioning visual positioning system and an execution unit are arranged on the displacement device, and the specific steps comprise: step A, identifying stacked materials on an operation platform by a coarse positioning visual positioning system machine; step B, designating a target material; c, the coarse positioning visual positioning system performs preliminary positioning and calculates the spatial position of the target material; d, the displacement device moves to a target position; e, accurately positioning the target material by the accurate positioning visual positioning system; and F, the execution unit grabs the target material. The invention adopts the hierarchical visual positioning measurement of the coarse positioning visual positioning system and the fine positioning visual positioning system, omits a great amount of calibration work, has simple system structure and easy maintenance, and effectively improves the precision of positioning measurement.

Description

Outdoor remote hierarchical visual positioning measurement method

Technical field:

the invention belongs to the technical field of visual positioning measurement, and particularly relates to an outdoor remote hierarchical visual positioning measurement method.

The background technology is as follows:

for specific outdoor large scenes such as a mineral aggregate storage yard, a container port storage yard and the like, statistics and measurement of operation objects are often required to be completed by means of technologies such as perception and communication in order to meet the automatic operation requirements of the corresponding scenes. For example, a container port needs to rely on a sensing positioning system to determine the grasping and releasing guide information of a loading and unloading container so as to realize automatic grasping, stacking and other operations, and besides the positioning requirement, the problems of measuring a target object in a specific scene exist, namely, the calculation of the volume of stacked mineral aggregate and the accurate calculation of the characteristic point position of the container.

The single-line laser radar co-positioning system adopted by the existing container port and dock comprises two single-line laser radars with mutually perpendicular laser directions, and the two laser radars are used for carrying out the determination of the position of the container and the position of a lifting appliance in a co-scanning mode. However, the laser scanning positioning method cannot determine the stacking condition of the whole container in real time, a large amount of calibration work needs to be performed, including the condition that a hanger in all positions in a shellfish position is traversed to grasp the container, repeated calibration work caused by ground subsidence of a storage yard and the like, and normally, a port needs 24 hours of continuous operation, and port shutdown calibration is needed when the calibration work is performed, so that port throughput and loading and unloading efficiency are greatly affected.

The invention comprises the following steps:

the invention aims to provide an outdoor remote hierarchical visual positioning measurement method, which adopts a coarse positioning visual positioning system and a fine positioning visual positioning system to perform hierarchical visual positioning measurement, and the coarse positioning visual positioning system and the fine positioning visual positioning system are respectively responsible for respective positioning measurement content and positioning measurement precision, so that a large amount of calibration work can be saved, and the working efficiency can be effectively improved.

The invention is realized in the following way:

the outdoor remote hierarchical visual positioning measurement method comprises an operation platform with stacked materials, wherein a displacement device capable of moving relative to the operation platform is arranged on the operation platform, a coarse positioning visual positioning system, a fine positioning visual positioning system and an execution unit are arranged on the displacement device, and the method comprises the following specific steps of:

step A: identifying stacked materials on the operation platform by a binocular camera of the coarse positioning visual positioning system to determine the stacking column number, the layer number and the space position of the stacked materials;

and (B) step (B): the method comprises the steps that target materials to be grabbed are specified manually or randomly by a system;

step C: the coarse positioning visual positioning system is used for initially positioning and calculating the spatial position of the target material to be grasped so as to obtain the coarse positioning movement offset;

step D: the displacement device drives the coarse positioning visual positioning system, the fine positioning visual positioning system and the execution unit to move to a target position together according to the coarse positioning motion offset, and a binocular camera of the fine positioning visual positioning system can shoot a target material to be grabbed at the target position;

step E: the precise positioning visual positioning system is used for precisely positioning the target material to be grasped and calculating the precise positioning motion offset between the target material to be grasped and the execution unit in real time;

step F: the displacement device drives the fine positioning visual positioning system and the execution unit to move to a target position together according to the fine positioning motion offset, and the execution unit grabs the target material to be grabbed at the target position according to the fine positioning motion offset.

In the above-mentioned outdoor remote hierarchical visual positioning measurement method, the step of the coarse positioning visual positioning system for identifying stacked materials sequentially includes:

step a1, binocular camera distortion correction and binocular camera epipolar correction: calibrating the internal parameters and the external parameters of the binocular camera by adopting a Zhang Zhengyou calibration method, and performing correction operation on an image layer according to the calibrated parameters;

step a2, identifying target detection stacked materials: identifying the stacked materials by adopting a target detection algorithm according to image data acquired by a binocular camera of the coarse positioning visual positioning system to obtain a rectangular envelope frame of an area where the stacked materials are located as candidate areas, wherein the same frame of image comprises a plurality of candidate areas;

step a3, extracting the stacked materials through semantic segmentation: c, segmenting pixels contained in the stacked materials on the candidate areas in the step a2 by adopting a semantic segmentation model, and calculating the gravity centers of the segmented pixels belonging to the stacked materials;

step a4, extracting the edges of the stacked materials: dividing the area of the pixels divided in the step a3 according to the appointed direction, and carrying out gradient calculation by utilizing an edge extraction operator according to the appointed direction so as to realize the top surface edge extraction of the stacked materials;

step a5, positioning key points of stacked materials: performing straight line fitting on the edges extracted in the step a4 by adopting a fitting algorithm to obtain an approximate contour of the stacked materials, and solving intersection points of two adjacent intersecting fitting straight line equations, wherein the intersection points are all angular points of the top surface of the stacked materials and serve as positioning information of the stacked materials; meanwhile, the constraint of the number of pixels occupied in the image when the top surface area and the side length of the stacked materials are at the designated height is added, and the contours which do not belong to the stacked materials are filtered;

step a6, binocular stereo matching: numbering the candidate areas obtained in the step a2 in the same frame of image according to the central coordinate position of the candidate areas and the sequence from top to bottom and from left to right; simultaneously numbering the positions of the intersection points solved in the step a5 in the image according to the sequence from top to bottom and from left to right; the left camera and the right camera of the binocular camera are matched according to numbers, the numbers are the same and can be regarded as the region where the same stacked material is located, and the pixel points with the candidate region numbers and the intersection point numbers are regarded as the homonymous points of the left camera and the right camera of the binocular camera.

In the above-mentioned outdoor remote hierarchical visual positioning measurement method, the step of using the coarse positioning visual positioning system to initially position and calculate the spatial position of the target material to be grabbed includes:

step c1, 3D space coordinates of a target material: depth calculation is performed according to the principle of triangulation according to a formula I,

equation one: d=bf/D;

wherein D represents the measured depth distance value and B represents the baseline length of the binocular camera; d represents the parallax of the same feature of the object surface in the left and right camera views of the binocular camera; f represents the focal length of the camera;

taking the left camera coordinate system of the double-sided camera as a world coordinate system, and calculating the depth value Z of the top surface angular point of the target material _W Calculating the top surface angular point of the target material under the camera coordinate system by a formula II by combining the two-dimensional plane coordinate of the homonymous point under the image coordinate system of the left camera and the internal and external parameters of the binocular camera3D spatial coordinates of (2);

formula II:

wherein P is _w Representing 3D space coordinates in a world coordinate system, p representing two-dimensional plane coordinates in an image coordinate system, A representing an internal reference matrix, [ R|t ]]Representing a rotational translation matrix from the world coordinate system to the camera coordinate system, s represents any scaling of the projective transformation.

In the above-mentioned outdoor remote hierarchical visual positioning measurement method, the step of precisely positioning the visual positioning system for precisely identifying the target material to be grasped sequentially includes:

step e1, binocular camera distortion correction and binocular camera epipolar correction: calibrating the internal parameters and the external parameters of the binocular camera by adopting a Zhang Zhengyou calibration method, and performing correction operation on an image layer according to the calibrated parameters;

step e2, extracting the stacked materials through semantic segmentation: dividing pixels contained in a target material by adopting a semantic division model on image data acquired by a binocular camera of a fine positioning visual positioning system, and calculating the gravity centers of the pixels which belong to the target material;

step e3, extracting the edges of the stacked materials: dividing the area of the pixels segmented in the step e2 according to the appointed direction, and carrying out gradient calculation by utilizing an edge extraction operator according to the appointed direction so as to realize top surface edge extraction of the target material;

step e4, positioning key points of stacked materials: performing straight line fitting on the edge extracted in the step e3 by adopting a fitting algorithm to obtain an approximate contour of the target material, and solving an intersection point of two adjacent intersecting fitting straight line equations, wherein the intersection point is each angular point of the top surface of the target material and is used as positioning information of the target material; meanwhile, the constraint of the number of pixels occupied in the image when the top surface area and the side length of the target material are at the designated height is added, and the outline which does not belong to the target material is filtered;

step e5, binocular stereo matching: numbering the positions of the intersection points solved in the step e4 in the image according to the sequence from top to bottom and from left to right; the left camera and the right camera of the binocular camera are matched according to the numbers, and pixel points with the same intersection numbers are regarded as homonymous points of the left camera and the right camera of the binocular camera.

In the above-mentioned outdoor remote hierarchical visual positioning measurement method, the step of the fine positioning visual positioning system for calculating the spatial position of the target material to be grabbed includes:

step e6, 3D space coordinates of the target material and the execution unit: depth calculation is performed according to the principle of triangulation according to a formula I,

equation one: d=bf/D;

taking the left camera coordinate system of the double-sided camera as a world coordinate system, and calculating the depth value Z of the top surface angular point of the target material _W Calculating the 3D space coordinate of the top surface corner point of the target material under the camera coordinate system by a formula II by combining the two-dimensional plane coordinate of the homonymous point under the image coordinate system of the left camera and the internal and external parameters of the binocular camera;

formula II:

In the above-mentioned outdoor remote hierarchical visual positioning measurement method, the target detection algorithm in the step a2 is a YOLO series algorithm.

In the above method for measuring visual positioning of outdoor remote hierarchy, the semantic segmentation model is Paddle Seg or OpenCV ene.

In the above-mentioned outdoor remote hierarchical visual positioning measurement method, the edge extraction operator is a Sobel operator, and the Sobel operator includes two groups of matrices G of 3*3 _u 、G _v ；

Taking 3 rows and 3 columns of image data on the image, multiplying the image data with the value of the operator at the corresponding position, and adding the multiplied image data to obtain G in the u direction _u G in the v direction _v By combining G _u And G _v Adding after squaring and taking the arithmetic square root to obtain G _uv Comparison of G _uv With the set threshold size, if G _uv If the value is larger than the threshold value, the point is a boundary value and a black point is displayed; if G _uv Less than the threshold value, the point displays a white point.

In the above-mentioned visual positioning measurement method of outdoor long-distance layering, the stacked materials are containers stacked up and down, the displacement device comprises a gantry supporting frame longitudinally connected to an operation platform in a sliding mode, a binocular camera of the coarse positioning visual positioning system is connected to the gantry supporting frame through a bracket, a transverse traction moving seat is transversely connected to the gantry supporting frame above the stacked materials in a sliding mode, the binocular camera of the fine positioning visual positioning system is connected to the transverse traction moving seat through a bracket, and the execution unit is a lifting appliance which is arranged on the transverse traction moving seat and can vertically move relative to the transverse traction moving seat.

Compared with the prior art, the invention has the outstanding advantages that:

the invention applies the stereo vision and the deep learning technology to the positioning measurement of the outdoor large scene at the same time, and compared with the existing positioning mode of laser scanning, the invention adopts the layered visual positioning measurement of the coarse positioning visual positioning system and the fine positioning visual positioning system, which are respectively responsible for the respective positioning measurement content and the positioning measurement precision, thereby omitting a large amount of calibration work, having simple system structure and easy maintenance, and effectively improving the positioning measurement precision and the whole operation efficiency.

Description of the drawings:

FIG. 1 is a schematic diagram of a first test of an embodiment of the present invention applied to a container port header;

FIG. 2 is a second schematic diagram of an embodiment of the present invention applied to a container port header;

FIG. 3 is a flow chart of an embodiment of the present invention applied to a container port code;

FIG. 4 is a schematic flow chart of the Sobel operator calculation gradient calculation of the present invention from left to right.

In the figure: 1. a binocular camera of the coarse positioning visual positioning system; 2. a binocular camera of the fine positioning visual positioning system; 3. a container; 4. a gantry support; 5. transversely dragging the movable seat; 6. and lifting appliance.

The specific embodiment is as follows:

the invention is further described below with reference to the specific examples, see fig. 1-4:

the embodiment applies the outdoor remote hierarchical visual positioning measurement method to the scene of a container port and a dock, wherein the stacked materials are containers 3 in a stacking area, the target materials are target containers 3 to be grabbed, and an execution unit is a lifting appliance 6, so the main purpose of the visual positioning measurement method in the scene is to perform visual positioning and measurement on the containers 3 in the stacking area, and the lifting appliance 6 is guided to automatically grab the containers 3.

The outdoor remote hierarchical visual positioning measurement method comprises an operating platform with stacked materials, wherein a displacement device capable of moving relative to the operating platform is arranged on the operating platform, a coarse positioning visual positioning system, a fine positioning visual positioning system and an execution unit are arranged on the displacement device, the stacked materials are containers 3 stacked up and down, the displacement device comprises a gantry support frame 4 longitudinally and slidingly connected to the operating platform, a binocular camera 1 of the coarse positioning visual positioning system is connected to the gantry support frame 4 through a bracket, a transverse traction moving seat 5 is transversely and slidingly connected to the gantry support frame 4 above the containers 3, a binocular camera 2 of the fine positioning visual positioning system is connected to the transverse traction moving seat 5 through a bracket, and the execution unit is a lifting appliance 6 which is arranged on the transverse traction moving seat 5 and can vertically move relative to the transverse traction moving seat 5; it should be noted that the field of view of the binocular camera 1 of the coarse positioning visual positioning system includes the container 3 on the whole working platform, whereas the field of view of the binocular camera 2 of the fine positioning visual positioning system can only photograph one container 3 directly under the spreader 6.

The method comprises the following specific steps:

step A: the binocular camera 1 of the coarse positioning visual positioning system is used for identifying the containers 3 on the operation platform in the whole visual field range so as to determine the stacking column number, the layer number and the space position of the containers 3;

step a2, identifying a target detection container: identifying the container 3 by adopting a target detection algorithm YOLO series algorithm according to image data acquired by a binocular camera 1 of the coarse positioning visual positioning system to obtain a rectangular envelope frame of an area where the container 3 is positioned as a candidate area, wherein the same frame of image comprises a plurality of candidate areas;

step a3, extracting semantic segmentation containers: c, dividing pixels contained in the container 3 by adopting a semantic division model Paddle Seg or OpenCV ene on the candidate region in the step a2, and calculating the gravity centers of the pixels which belong to the container 3 after division;

step a4, container edge extraction: dividing the area of the pixels divided in the step a3 according to the appointed direction, and carrying out gradient calculation by utilizing an edge extraction operator according to the appointed direction so as to realize the top surface edge extraction of the container 3;

step a5, positioning key points of the container: performing straight line fitting on the edges extracted in the step a4 by adopting a fitting algorithm to obtain an approximate contour of the container 3, and solving intersection points of two adjacent intersecting fitting straight line equations, wherein the intersection points are all angular points of the top surface of the container 3 and serve as positioning information of the container 3; meanwhile, the constraint of the number of pixels occupied in the image when the top surface area and the side length of the container 3 are at the designated height is added, and the outline which does not belong to the container 3 is filtered;

step a6, binocular stereo matching: numbering the candidate areas obtained in the step a2 in the same frame of image according to the central coordinate position of the candidate areas and the sequence from top to bottom and from left to right; simultaneously numbering the positions of the intersection points solved in the step a5 in the image according to the sequence from top to bottom and from left to right; the left camera and the right camera of the binocular camera are matched according to numbers, the numbers are the same and can be regarded as the region where the same container 3 is located, and the pixel points with the candidate region numbers and the intersection point numbers are regarded as the homonymous points of the left camera and the right camera of the binocular camera.

And (B) step (B): the target container 3 to be grabbed is specified manually or by a system randomly; that is, the command for grabbing the target container 3 can be automatically sent to the coarse positioning visual positioning system or the fine positioning visual positioning system by the person through the motion control system or the motion control system.

Step C: the rough positioning visual positioning system is used for initially positioning and calculating the space position of the target container 3 to be grabbed so as to obtain rough positioning movement offset;

step c1, 3D space coordinates of a stacking container: depth calculation is performed according to the principle of triangulation according to a formula I,

equation one: d=bf/D;

calculating the depth value Z of the top corner of the target container 3 by using the left camera coordinate system of the double-sided camera as the world coordinate system _W Calculating the 3D space coordinate of the top surface corner point of the target container 3 under the camera coordinate system by a formula II by combining the two-dimensional plane coordinate of the homonymous point under the image coordinate system of the left camera and the internal and external parameters of the binocular camera;

formula II:

Of course, the coarse positioning visual positioning system may calculate the corresponding 3D space coordinates of the space positions of the containers 3 in the field of view in advance, and in this case, as long as the coarse positioning visual positioning system receives the target container 3 instruction in the step B, the coarse positioning visual positioning system may directly perform data matching extraction to enter the step D.

Step D: the displacement device drives the coarse positioning visual positioning system, the fine positioning visual positioning system and the lifting appliance 6 to move to a target position together according to the coarse positioning motion offset, and a binocular camera 2 of the fine positioning visual positioning system can shoot a target container 3 to be grabbed at the target position; i.e. the field of view of the binocular camera 2 of the fine positioning vision positioning system can only take a picture of the target container 3 directly below the spreader 6.

Step E: the accurate positioning visual positioning system is used for accurately positioning the target container 3 to be grabbed, and the accurate positioning motion offset between the target container 3 to be grabbed and the lifting appliance 6 is calculated in real time:

step e2, extracting semantic segmentation containers: dividing pixels contained in a target container 3 by adopting a semantic division model Paddle Seg or OpenCV ENT on image data acquired by a binocular camera 2 of the fine positioning visual positioning system, and calculating the gravity centers of the pixels which belong to the target container 3;

step e3, container edge extraction: dividing the region of the pixels segmented in the step e2 according to the appointed direction, and carrying out gradient calculation by utilizing an edge extraction operator according to the appointed direction so as to realize top surface edge extraction of the target container 3;

step e4, positioning key points of the container: performing straight line fitting on the edge extracted in the step e3 by adopting a fitting algorithm to obtain an approximate contour of the target container 3, and solving an intersection point of two adjacent intersecting fitting straight line equations, wherein the intersection point is each corner point of the top surface of the target container 3 and is used as positioning information of the target container 3; meanwhile, the constraint of the number of pixels occupied in the image when the top surface area and the side length of the target container 3 are at the designated height is added, and the outline which does not belong to the target container 3 is filtered;

step e5, binocular stereo matching: numbering the positions of the intersection points solved in the step e4 in the image according to the sequence from top to bottom and from left to right; the left camera and the right camera of the binocular camera are matched according to the numbers, and pixel points with the same intersection numbers are regarded as homonymous points of the left camera and the right camera of the binocular camera;

step e6, 3D space coordinates of the target container and the lifting appliance: depth calculation is performed according to the principle of triangulation according to a formula I,

equation one: d=bf/D;

calculating the depth value Z of the top corner of the target container 3 by using the left camera coordinate system of the double-sided camera as the world coordinate system _W Combining two-dimensional plane coordinates of homonymous points under an image coordinate system of a left camera and a binocular cameraCalculating 3D space coordinates of the top corner point of the target container 3 under a camera coordinate system according to the inner parameter and the outer parameter and a formula II;

formula II:

Step F: the displacement device drives the fine positioning visual positioning system to move to a target position together with the lifting appliance 6 according to the fine positioning motion offset, and the lifting appliance 6 grabs the target container 3 to be grabbed at the target position according to the fine positioning motion offset.

Meanwhile, the edge extraction operator is a Sobel operator, and the Sobel operator comprises two groups of matrixes G of 3*3 _u 、G _v ；

In the description, fig. 4 of the accompanying drawings shows that when the Sobel operator calculates the gradient, the description refers to "taking 3 rows and 3 columns of image data", after semantic segmentation and calculation to obtain the center of gravity of the pixel belonging to the container 3, three rows and three columns of image data are sequentially selected and calculated (from middle calculation to two sides) starting from the position of the center of gravity in the u-axis direction along the u-direction and along the u-direction until v is equal to the longitudinal end of the image; the v direction is the same, but only the position where the starting point is the gravity center v axis direction is calculated in the v axis direction and the v axis reverse direction (from the middle calculation to both sides).

The above embodiment is only one of the preferred embodiments of the present invention, and is not intended to limit the scope of the present invention, therefore: all equivalent changes in shape, structure and principle of the invention should be covered in the scope of protection of the invention.

Claims

1. An outdoor remote hierarchical visual positioning measurement method is characterized in that: the device comprises a working platform with stacked materials, a displacement device capable of moving relative to the working platform is arranged on the working platform, a coarse positioning visual positioning system, a fine positioning visual positioning system and an execution unit are arranged on the displacement device, and the specific steps comprise:

step A: identifying stacked materials on the operation platform by a binocular camera (1) of the coarse positioning visual positioning system to determine the stacking column number, the layer number and the space position of the stacked materials;

step D: the displacement device drives the coarse positioning visual positioning system, the fine positioning visual positioning system and the execution unit to move to a target position together according to the coarse positioning motion offset, and a binocular camera (2) of the fine positioning visual positioning system can shoot a target material to be grabbed at the target position;

2. The outdoor remote hierarchical visual positioning measurement method according to claim 1, wherein: the step that coarse positioning vision positioning system was used for discernment to pile up material includes in proper order:

step a2, identifying target detection stacked materials: identifying the stacked materials by adopting a target detection algorithm according to image data acquired by a binocular camera (1) of the coarse positioning visual positioning system to obtain a rectangular envelope frame of an area where the stacked materials are located as candidate areas, wherein the same frame of image comprises a plurality of candidate areas;

3. The outdoor remote hierarchical visual positioning measurement method according to claim 2, wherein: the step that coarse positioning vision positioning system is used for preliminary location and calculates the target material that waits to snatch is located the space position includes:

equation one: d=bf/D;

formula II: sp=a [ r|t]P _w ，

Wherein P is _w Representing 3D space coordinates in the world coordinate system, p representing two-dimensional plane coordinates in the image coordinate system,a represents an internal reference matrix, [ R|t ]]Representing a rotational translation matrix from the world coordinate system to the camera coordinate system, s represents any scaling of the projective transformation.

4. The outdoor remote hierarchical visual positioning measurement method according to claim 1, wherein: the accurate positioning visual positioning system is used for accurately identifying target materials to be grabbed, and comprises the following steps in sequence:

step e2, extracting the stacked materials through semantic segmentation: dividing pixels contained in a target material by adopting a semantic division model on image data acquired by a binocular camera (2) of the fine positioning visual positioning system, and calculating the gravity centers of the pixels which belong to the target material;

5. The outdoor remote hierarchical visual positioning measurement method according to claim 4, wherein: the step that the accurate positioning vision positioning system is used for calculating the space position of the target material to be grabbed comprises the following steps:

equation one: d=bf/D;

formula II: sp=a [ r|t]P _w ，

6. The outdoor remote hierarchical visual positioning measurement method according to claim 2, wherein: the target detection algorithm in the step a2 is a YOLO series algorithm.

7. The outdoor remote hierarchical visual positioning measurement method according to claim 2 or 4, wherein: the semantic segmentation model is Paddle Seg or OpenCV ene.

8. The outdoor remote hierarchical visual positioning measurement method according to claim 2 or 4, wherein: the edge extraction operator is a Sobel operator, and the Sobel operator comprises two groups of matrixes G of 3*3 _u 、G _v ；

9. The outdoor remote hierarchical visual positioning measurement method according to any one of claims 1-5, wherein: the device is characterized in that the stacked materials are stacked up and down and placed in a container (3), the displacement device comprises a gantry supporting frame (4) longitudinally connected to an operation platform in a sliding mode, a binocular camera (1) of the coarse positioning visual positioning system is connected to the gantry supporting frame (4) through a support, a transverse traction moving seat (5) is transversely connected to the gantry supporting frame (4) above the stacked materials in a sliding mode, the binocular camera (2) of the fine positioning visual positioning system is connected to the transverse traction moving seat (5) through a support, and the execution unit is a lifting appliance (6) which is arranged on the transverse traction moving seat (5) and can vertically move relative to the transverse traction moving seat (5).