CN116433856B - Three-dimensional reconstruction method and system for lower scene of tower crane based on monocular camera - Google Patents

Three-dimensional reconstruction method and system for lower scene of tower crane based on monocular camera Download PDF

Info

Publication number
CN116433856B
CN116433856B CN202310148891.4A CN202310148891A CN116433856B CN 116433856 B CN116433856 B CN 116433856B CN 202310148891 A CN202310148891 A CN 202310148891A CN 116433856 B CN116433856 B CN 116433856B
Authority
CN
China
Prior art keywords
tower crane
monocular camera
relative pose
semantic
reconstruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310148891.4A
Other languages
Chinese (zh)
Other versions
CN116433856A (en
Inventor
安民洙
米文忠
房新奥
郭振威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Light Speed Intelligent Equipment Co ltd
Tenghui Technology Building Intelligence Shenzhen Co ltd
Original Assignee
Guangdong Light Speed Intelligent Equipment Co ltd
Tenghui Technology Building Intelligence Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Light Speed Intelligent Equipment Co ltd, Tenghui Technology Building Intelligence Shenzhen Co ltd filed Critical Guangdong Light Speed Intelligent Equipment Co ltd
Priority to CN202310148891.4A priority Critical patent/CN116433856B/en
Publication of CN116433856A publication Critical patent/CN116433856A/en
Application granted granted Critical
Publication of CN116433856B publication Critical patent/CN116433856B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/05Geographic models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Remote Sensing (AREA)
  • Computer Graphics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a three-dimensional reconstruction method and a three-dimensional reconstruction system for a scene below a tower crane based on a monocular camera, wherein the method comprises the following steps: acquiring target images of adjacent frames on a time axis, and performing target semantic segmentation based on deep learning to obtain a plurality of segmentation example results, wherein the segmentation example results comprise characteristic information and semantic information; adopting a multistage solving mode, firstly carrying out two-dimensional plane assumption on the relative motion of a monocular camera according to tower crane motion parameters, and calculating a rotation and translation matrix R between two target images 0 And T 0 Calculating relative positions R and T between two frames of target images by utilizing characteristic information matching to obtain the relative pose of the monocular camera; performing error redistribution on the inter-frame relative pose results of the long-time sequence by combining semantic segmentation loss, homogenizing accumulated errors to each frame, and obtaining error-optimized relative poses; and performing dense reconstruction according to the relative pose and semantic information after error optimization. The invention has the advantages of small calculated amount, high efficiency, high stability, high precision and low cost.

Description

Three-dimensional reconstruction method and system for lower scene of tower crane based on monocular camera
Technical Field
The invention belongs to the technical field of map reconstruction, and particularly relates to a method and a system for reconstructing a scene under a tower crane based on a monocular camera.
Background
In the field of tower crane construction, the topography and scene information below the large arm of the tower crane has important significance for the operation safety of the tower crane. How to perform stable reconstruction of the topography and the scene below the tower crane is an important problem in the aspects of active safety and automatic driving of the tower crane.
At present, three-dimensional reconstruction of a scene below a tower crane mainly comprises three technical approaches: 1) A binocular camera-based method; 2) A lidar-based method; 3) Methods based on monocular cameras and sequence data analysis.
Binocular camera-based methods are the most classical vision-based terrain reconstruction methods, but binocular cameras require strict relative pose calibration before use, and the baseline length needs to be set according to scene-to-camera height, with significant limitations.
The method based on the laser radar has the best stability and precision in three methods, but the laser radar capable of meeting the operation scene of the tower crane has very high price, and the cost is 100 times or even 1000 times of that of a camera. And the field angle of the laser radar is generally smaller, and a plurality of laser radars are required to cooperatively process, so that the cost is further increased, and the large-scale popularization and application are not facilitated.
The method based on the monocular camera is the method with the lowest cost and the least limitation on the installation and use conditions among three types of methods. However, the method has high requirements on reconstruction algorithms, and needs to perform accurate sequence data analysis and solve a large-scale nonlinear optimization problem, so that the stability is poor, and the reconstruction error is larger and the accuracy is poorer along with the lengthening of the sequence images. Under the background of multiple moving targets and quick scene change in the tower crane construction scene, great challenges are brought to the analysis of sequence images, and the application difficulty of monocular camera sequence data in the reconstruction of the tower crane construction scene is further increased.
Therefore, how to use a monocular camera to efficiently and with small error complete the three-dimensional reconstruction of the scene under the tower crane is a technical problem to be solved in the art.
The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a three-dimensional reconstruction method and system for a scene under a tower crane based on a monocular camera, which are mainly used for solving the problems that the three-dimensional reconstruction of the scene under the tower crane cannot be completed with high efficiency and small error by using the monocular camera in the prior art.
In order to achieve the above object, in a first aspect, the present invention provides a three-dimensional reconstruction method for a scene under a tower crane based on a monocular camera, where the monocular camera is installed under a tower crane trolley and is used for collecting an image under the tower crane trolley, and the method includes the following steps:
s10, acquiring target images of adjacent frames on a time axis, and performing target semantic segmentation based on deep learning to obtain a plurality of segmentation example results, wherein the segmentation example results comprise characteristic information and semantic information;
s20, adopting a multistage solving mode, firstly carrying out two-dimensional plane assumption on the relative motion of the monocular camera according to tower crane motion parameters, and calculating a rotation and translation matrix R between two target images 0 And T 0 Calculating relative positions R and T between two frames of target images by utilizing characteristic information matching to obtain the relative pose of the monocular camera;
s30, combining semantic segmentation loss, carrying out error redistribution on the inter-frame relative pose results of the long-time sequence, homogenizing the accumulated errors to each frame, and obtaining the relative pose after error optimization;
and S40, performing dense reconstruction according to the relative pose and semantic information after error optimization to obtain a scene reconstruction result below the tower crane.
In some embodiments, the tower crane motion parameter comprises at least one of tower crane trolley radial distance, hook height, boom rotation angle.
In some embodiments, alignment of feature information and semantic information in the several segmentation instance results is maintained to assist in the computation of the constrained relative pose as the computation of step S20 is performed.
In some embodiments, in step S20, the multi-stage solution includes a first-stage solution and a second-stage solution;
at the first stage of solvingWhen the tower crane motion parameter is used as an initial value, two-dimensional plane assumption is made on the relative motion of the monocular camera, and a rotation and translation matrix R between two target images is calculated according to the initial value 0 And T 0 Downsampling the image with magnification, extracting single-scale feature points, selecting a local window, and performing R 0 、T 0 Under the constraint of the segmentation example result, matching the characteristic points, constructing homonymy point pairs, calculating affine transformation parameters, and completing the relative pose R under the planar assumption 0 And T 0 Is corrected to obtain R 1 And T 1
When the second-stage solution is carried out, extracting feature points at an original resolution layer of the target image to obtain multi-scale feature points, wherein R is as follows 1 And T 1 Under the constraint of (1), matching the characteristic points, and calculating the relative pose R and T between two frames of target images by utilizing characteristic information matching to obtain the relative pose of the monocular camera.
In some embodiments, the R and T are configured such that no two-dimensional plane assumption is made therebetween, and when the relative pose R and T are calculated, the three-dimensional position and rotation are calculated according to the internal azimuth parameters of the monocular camera, so as to obtain the relative pose of the monocular camera.
In some embodiments, in step S30, a modified beam-method adjustment algorithm is used to add semantic segmentation losses to the objective function, the semantic segmentation losses being the euclidean distance of the static objective semantic boundary and the euclidean distance of the dynamic objective semantic boundary, as shown in the following formula:
in the above formula, SIGMA II q-p (c, x) II is a standard beam method adjustment formula to calculate the reprojection error of the key point; sigma II ins-ins' II is the reprojection error of the static target semantic boundary points between adjacent frames; lambda is a penalty factor.
In some embodiments, the Euclidean distance of the dynamic target semantic boundary is less than a dynamic boundary threshold, and the penalty factor λ is 0.2.
In some embodiments, in step S40, the following steps are included:
acquiring homonymous lines according to the relative pose after error optimization and the basic principle of measuring epipolar geometry;
matching a first processing window for each pixel point on the homonym line, introducing semantic information into the matching relation, matching the first processing window corresponding to the semantic information on the homonym line, and searching the optimal homonym point in the first processing window by taking the normalized gray scale correlation coefficient as the similarity;
and (5) performing dense reconstruction to obtain a scene reconstruction result below the tower crane.
In some embodiments, in step S40, semantic information with planar characteristics is identified, boundary matching of the segmentation instance result is performed, planar hypothesis processing is performed on the interior of the segmentation instance result, and homonym matching is not performed.
In a second aspect, the present invention provides a system applied to the above three-dimensional reconstruction method of a scene under a tower crane based on a monocular camera, including:
the monocular camera is arranged below the tower crane trolley and is used for acquiring a lower image;
the semantic segmentation module is used for acquiring target images of adjacent frames on a time axis, and performing target semantic segmentation based on deep learning to obtain a plurality of segmentation example results, wherein the segmentation example results comprise characteristic information and semantic information;
the relative pose solving module is used for adopting a multi-stage solving mode to firstly make a two-dimensional plane assumption on the relative motion of the monocular camera according to the tower crane motion parameters and calculate a rotation and translation matrix R between two target images 0 And T 0 Calculating relative positions R and T between two frames of target images by utilizing characteristic information matching to obtain the relative pose of the monocular camera;
the error optimization processing module is used for carrying out error redistribution on the inter-frame relative pose results of the long-time sequence by combining semantic segmentation loss, homogenizing the accumulated errors to each frame, and obtaining the relative pose after error optimization;
and the three-dimensional reconstruction module is used for carrying out dense reconstruction according to the relative pose and semantic information after error optimization to obtain a scene reconstruction result below the tower crane.
Compared with the prior art, the invention has the beneficial effects that at least:
image acquisition is carried out by using a monocular camera, target semantic segmentation is carried out based on deep learning, constraints are provided for relative pose calculation and dense reconstruction of continuous frames, stability is improved, calculated amount is reduced, cost is low, and popularization and application are facilitated;
the method is characterized in that a multistage solving mode is adopted, the relative pose of a camera in an image sequence is calculated based on semantic information and image characteristic information, tower crane information and semantic information are used for restraining, the method is more reliable than the method which uses image characteristic points alone, the relative pose is used as an initial value of dense reconstruction, and the method has a decisive effect on the final dense reconstruction effect;
aiming at the problem of error accumulation in the relative pose calculation of a long-time sequence, the error redistribution is carried out by combining the semantic segmentation loss, and the accumulated errors are homogenized to each frame, so that the accuracy of the relative pose calculation result is improved, and the accuracy of scene reconstruction is ensured.
Drawings
The invention will be further described with reference to the accompanying drawings, in which embodiments do not constitute any limitation of the invention, and other drawings can be obtained by one of ordinary skill in the art without inventive effort from the following drawings.
Fig. 1 is a flow chart of a three-dimensional reconstruction method of a scene under a tower crane based on a monocular camera according to an embodiment.
FIG. 2 is a schematic diagram of a deployment location of a monocular camera under an embodiment.
Fig. 3 is a flow chart of a three-dimensional reconstruction method of a scene under a tower crane based on a monocular camera according to another embodiment.
FIG. 4 is a schematic diagram of a process for object semantic segmentation of an object image in one embodiment.
Fig. 5 is a schematic diagram of a result of three-dimensional reconstruction of a scene under a tower crane in one embodiment.
Fig. 6 is a schematic diagram of a three-dimensional reconstruction system of a scene under a tower crane based on a monocular camera according to an embodiment.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Referring to fig. 1 to 2, in a first aspect, the present invention provides a three-dimensional reconstruction method of a scene under a tower crane based on a monocular camera, wherein the monocular camera is installed under a tower crane trolley and is used for acquiring an image of the lower part, and the method comprises the following steps:
s10, acquiring target images of adjacent frames on a time axis, performing target semantic segmentation based on deep learning, wherein each frame of target image is segmented to obtain a plurality of segmentation example results, and the target images comprise static targets and dynamic targets, so that the segmentation example results comprise the static segmentation results and the dynamic segmentation results, the segmentation example results comprise characteristic information and semantic information, and boundaries of each segmentation result are obtained after the target semantic segmentation;
s20, adopting a multi-stage solving mode, firstly according to the movement of the tower craneParameters two-dimensional plane assumption of relative motion of monocular camera, calculating rotation and translation matrix R between two target images 0 And T 0 In this stage of solution, a two-dimensional plane assumption is made, and the rotation and translation matrix R between two target images is easily calculated due to the addition of tower crane motion parameters 0 And T 0 The method comprises the steps of carrying out a first treatment on the surface of the Still further, in the next level of solution, using feature information matching, at R 0 And T 0 Under the constraint of the (2), calculating the relative positions R and T between two frames of target images to obtain the relative pose of the monocular camera;
s30, combining semantic segmentation loss, carrying out error redistribution on the inter-frame relative pose results of the long-time sequence, homogenizing the accumulated errors to each frame, and obtaining the relative pose after error optimization; because the three-dimensional reconstruction is a long-term continuous dynamic process, when the acquired target image sequence is longer and longer, error accumulation exists for the relative pose result calculated according to the adjacent images, the longer the sequence is, the larger the accumulated error is, if error redistribution is not carried out, the front data reconstruction precision is good, and the rear data reconstruction precision is poor, so that the semantic segmentation loss is combined in the step, the error is redistributed, and each frame is uniformly distributed as far as possible;
and S40, performing dense reconstruction according to the relative pose and semantic information after error optimization to obtain a scene reconstruction result below the tower crane.
According to the method, when the tower crane trolley moves and the large arm rotates, an image sequence below the tower crane is obtained by using the monocular camera, and then, target images of adjacent frames on a time axis are obtained in the image sequence, and it is noted that in order to optimize calculation efficiency, the monocular camera can be set to collect images according to a certain frequency, for example, once every 3 seconds, so that the time interval between two target images of the adjacent frames is 3 seconds, the three-dimensional reconstruction of the method takes 3 seconds, and the time of the two images is overlapped, so that the efficiency is effectively improved, and the redundant calculation amount is reduced; after target image is subject to target semantic segmentation, the boundary of a dynamic target and a static target is identified, then the relative pose of a camera is solved based on image feature information and semantic segmentation results, after the relative pose results are obtained, the relative pose results of long-time sequence images are subjected to error redistribution and global adjustment optimization, finally dense reconstruction is carried out according to the relative pose and semantic information, three-dimensional reconstruction of a scene below the tower crane in the dynamic operation process can be completed by using only one monocular camera, complicated sensor calibration is not needed, the cost is extremely low, the large-scale popularization and application are facilitated, the factors of tower crane motion parameters, semantic information constraint and error redistribution are considered in the calculation process, the reconstruction stability and accuracy are improved, the calculation amount is reduced, and the efficiency is improved.
As an implementation mode, the tower crane motion parameters include at least one of the radial distance of the tower crane trolley, the height of the lifting hook and the rotation angle of the big arm, and the lower image acquired by the monocular camera is changed in the moving, the rotation of the big arm and the lifting hook lifting process of the tower crane trolley, or the scene topography under the lower part is changed, or the height of the lifting object is changed, so that in order to improve the efficiency and the precision in calculating the relative pose, the rotation and translation matrix R between two images can be calculated rapidly by combining the tower crane motion parameters under the assumption of a two-dimensional plane 0 And T 0 . It should be noted that the acquisition of the tower crane motion parameters and the first-stage solution with two-dimensional plane assumption in the multi-stage solution mode have technical integrity, and the R can be calculated quickly by directly utilizing the tower crane motion parameters in the first-stage solution 0 And T 0 Belongs to top-level calculation of the frame pose, and provides a basis for accurate calculation of the following frame pose.
In one embodiment, during the calculation in step S20, alignment of feature information and semantic information in the results of several segmentation examples is maintained, that is, corresponding features and semantics in different frames are both consistent and aligned, so as to assist in calculating the constraint relative pose.
Further, in step S10, a U-Net type segmentation method with cross-layer connection is adopted, a mobilenet v3 is selected in a backbone network part, an ablation experiment is performed on a network according to an actual scene, an optimal balance between a calculation load and an application effect is achieved by adjusting a resolution factor, a width factor and a network layer number, preferably, the width factor is adjusted to be 0.75, the resolution factor is adjusted to be 0.5, the depth is adjusted to be 0.8 of a standard mobilenet v3, under the above parameter configuration, the situation that a large number of bulk building materials are in a tower crane operation scene, such as steel bars, steel pipes, wood bars and the like, a group or a stack of bulk building material targets which are gathered in space are marked as a segmentation example result, and a U-type convolutional neural network with cross-layer connection is adopted to segment the image to obtain a final semantic example segmentation result.
Referring to fig. 3, in the present embodiment, in step S20, the multi-stage solving manner includes a first-stage solving and a second-stage solving;
when the first-stage solution is carried out, the tower crane motion parameter is taken as an initial value, a two-dimensional plane assumption is carried out on the relative motion of the monocular camera, and a rotation and translation matrix R between two target images is calculated according to the initial value 0 And T 0 Because the error of the tower crane motion parameter is larger, the R is generally calculated on the meter level through the tower crane motion parameter and the two-dimensional plane assumption 0 And T 0 In this stage, the image is further downsampled by a magnification factor, for example, 3-5 times, to extract single-scale feature points, the feature point extraction algorithm selects Harris operator, selects a larger local window, and uses the Harris operator to extract the feature points in R 0 、T 0 Under the constraint of a segmentation example result, matching the characteristic points, calculating HOG in a local window by using the corner positions extracted by Harris in the matching, constructing homonymous point pairs by using the Euclidean distance of the HOG as a matching measure, and eliminating that R is not satisfied 0 And T 0 Constrained homonymy point pairs are constructed so far, affine transformation parameters are calculated according to all homonymy point pairs, and relative pose R under plane assumption is completed 0 And T 0 Is corrected to obtain R 1 And T 1
When the second-stage solution is carried out, extracting feature points at the original resolution layer of the target image, obtaining multi-scale feature points by using SIFT, and carrying out R 1 And T 1 Under the constraint of (1) matching the characteristic points, calculating the relative pose R and T between two frames of target images by utilizing characteristic information matching to obtain the relative pose of the monocular camera, and in the second-stage solving, no two-dimensional plane assumption is made between R and T, and when the relative pose R and T are solved, the three-dimensional position and rotation are solved according to the internal azimuth parameters of the monocular camera to obtain the relative pose of the monocular camera.
The first-stage solution is the top-layer calculation of the frame pose, the second-stage solution is the accurate calculation of the frame pose, and a grading method of constraint of tower crane information and semantic information is used in pose estimation, so that the method has stability and precision, and the calculated amount is reduced.
In this embodiment, in step S30, an improved beam method adjustment algorithm is adopted to perform global adjustment optimization on the relative pose result, and in more detail, a semantic segmentation loss is added to an optimization objective function of the beam method adjustment, where the semantic segmentation loss is defined as a euclidean distance of a static target semantic boundary and a euclidean distance of a dynamic target semantic boundary, and the final objective function is shown in the following formula:
g(c,x,instance)=∑‖q-p(c,x)‖+λ∑‖ins-ins′‖
in the above formula, SIGMA II q-p (c, x) II is a standard beam method adjustment formula to calculate the reprojection error of the key point; sigma II ins-ins' II is the reprojection error of the static target semantic boundary points between adjacent frames in time; lambda is a penalty factor.
In the definition of the semantic segmentation loss, the Euclidean distance of the semantic boundary of the static target is required to be as small as possible, the spatial alignment is realized as much as possible, the Euclidean distance of the semantic boundary of the dynamic target is required to be smaller than a dynamic boundary threshold value, the dynamic target does not need subversion spatial movement change, and the penalty factor lambda is 0.2.
By using Euclidean distance of semantic boundary as regularization term and adding penalty factor into objective function of beam method adjustment, error redistribution can be realized in the process of relative pose result of long-time sequence image, accumulated error can be distributed to each frame uniformly as much as possible, so as to reduce error and improve accuracy.
Referring to fig. 5, in the present embodiment, in step S40, the following steps are included:
acquiring homonymous lines according to the relative pose after error optimization and the basic principle of measuring epipolar geometry;
matching a first processing window for each pixel point on the homonym line, introducing semantic information into the matching relation, matching the first processing window corresponding to the semantic information on the homonym line, and searching for the optimal homonym point by taking the normalized gray scale correlation coefficient as the similarity inside the first processing window;
and (5) performing dense reconstruction to obtain a scene reconstruction result below the tower crane.
Further, since the semantic information is introduced into the matching relationship, the semantic information with plane characteristics, such as a roof, a terrace and the like, is identified, the boundary matching of the segmentation example result is directly carried out, the plane hypothesis processing is carried out on the interior of the segmentation example result, the homonymous point matching is not carried out, and the dense reconstruction jump of the plane area is restrained.
As shown in fig. 5, the left side is a calculation processing diagram in the reconstruction process, and after dense reconstruction, a reconstruction result shown on the right side is formed.
As one embodiment, since various terrains and building materials exist in a scene below the tower crane, a horizontal construction surface exists, a vertical construction surface exists, when the object semantic segmentation based on deep learning is performed, a static object is subdivided, an object closest to the tower crane object is identified, and the identified segmentation example result is marked, for example, a distance from the tower crane object is less than 10m and is marked as 1 class, a distance from the tower crane object is less than 30m and is more than 10m and is marked as 2 class, a distance from the tower crane object is more than 30m and is marked as 3 class, the marking information belongs to one of semantic information, different semantic information is distinguished by the priority of the marking information, and when dense reconstruction is performed, a higher-level area of the semantic information is preferentially reconstructed to form a reconstruction sequence from the center to the periphery or from the high to the low.
Referring to fig. 6, in a second aspect, the present invention provides a system applied to the above three-dimensional reconstruction method of a scene under a tower crane based on a monocular camera, including:
the monocular camera is arranged below the tower crane trolley and is used for continuously collecting lower images;
the semantic segmentation module is used for acquiring target images of adjacent frames on a time axis, and performing target semantic segmentation based on deep learning to obtain a plurality of segmentation example results, wherein the segmentation example results comprise characteristic information and semantic information;
the relative pose solving module is used for adopting a multi-stage solving mode to firstly make a two-dimensional plane assumption on the relative motion of the monocular camera according to the tower crane motion parameters and calculate a rotation and translation matrix R between two target images 0 And T 0 Calculating relative positions R and T between two frames of target images by utilizing characteristic information matching to obtain the relative pose of the monocular camera;
the error optimization processing module is used for carrying out error redistribution on the inter-frame relative pose results of the long-time sequence by combining semantic segmentation loss, homogenizing the accumulated errors to each frame, and obtaining the relative pose after error optimization;
and the three-dimensional reconstruction module is used for carrying out dense reconstruction according to the relative pose and semantic information after error optimization to obtain a scene reconstruction result below the tower crane.
All the above modules are used to implement the three-dimensional reconstruction method of the scene under the tower crane in the above embodiment, and specific embodiments are not described herein in detail.
Compared with the prior art, the invention provides the three-dimensional reconstruction method and the system for the scene under the tower crane based on the monocular camera, which are characterized in that the monocular camera is utilized for image acquisition, the target semantic segmentation is carried out based on the deep learning, the constraint is provided for the relative pose calculation and dense reconstruction of continuous frames, the stability is improved, the calculated amount is reduced, the cost is low, and the popularization and the application are convenient;
the method is characterized in that a multistage solving mode is adopted, the relative pose of a camera in an image sequence is calculated based on semantic information and image characteristic information, tower crane information and semantic information are used for restraining, the method is more reliable than the method which uses image characteristic points alone, the relative pose is used as an initial value of dense reconstruction, and the method has a decisive effect on the final dense reconstruction effect;
aiming at the problem of error accumulation in the relative pose calculation of a long-time sequence, the error redistribution is carried out by combining the semantic segmentation loss, and the accumulated errors are homogenized to each frame, so that the accuracy of the relative pose calculation result is improved, and the accuracy of scene reconstruction is ensured.
Finally, it should be emphasized that the present invention is not limited to the above-described embodiments, but is merely preferred embodiments of the invention, and any modifications, equivalents, improvements, etc. within the spirit and principles of the invention are intended to be included within the scope of the invention.
The above description is a main flow step of the invention, in which other functional steps may be inserted, and the logic sequence and the flow steps may be disordered, if the data processing manner is similar to the processing manner of the flow step or the core idea of the data processing is similar, the same, all should be protected.

Claims (9)

1. The three-dimensional reconstruction method for the scene below the tower crane based on the monocular camera is characterized by comprising the following steps of:
s10, acquiring target images of adjacent frames on a time axis, and performing target semantic segmentation based on deep learning to obtain a plurality of segmentation example results, wherein the segmentation example results comprise characteristic information and semantic information;
s20, adopting a multistage solving mode, firstly carrying out two-dimensional plane assumption on the relative motion of the monocular camera according to tower crane motion parameters, and calculating a rotation and translation matrix R between two target images 0 And T 0 Calculating relative positions R and T between two frames of target images by utilizing characteristic information matching to obtain the relative pose of the monocular camera;
s30, combining semantic segmentation loss, carrying out error redistribution on the inter-frame relative pose results of the long-time sequence, homogenizing the accumulated errors to each frame, and obtaining the relative pose after error optimization;
s40, performing dense reconstruction according to the relative pose and semantic information after error optimization to obtain a scene reconstruction result below the tower crane;
in step S20, the multi-stage solving method includes a first-stage solving and a second-stage solving;
when the first-stage solution is carried out, the tower crane motion parameter is taken as an initial value, a two-dimensional plane assumption is carried out on the relative motion of the monocular camera, and a rotation and translation matrix R between two target images is calculated according to the initial value 0 And T 0 Downsampling the image with magnification, extracting single-scale feature points, selecting a local window, and performing R 0 、T 0 Under the constraint of the segmentation example result, matching the characteristic points, constructing homonymy point pairs, calculating affine transformation parameters, and completing the relative pose R under the planar assumption 0 And T 0 Is corrected to obtain R 1 And T 1
When the second-stage solution is carried out, extracting feature points at an original resolution layer of the target image to obtain multi-scale feature points, wherein R is as follows 1 And T 1 Under the constraint of (1), matching the characteristic points, and calculating the relative pose R and T between two frames of target images by utilizing characteristic information matching to obtain the relative pose of the monocular camera.
2. The method for three-dimensional reconstruction of a scene under a tower crane based on a monocular camera according to claim 1, wherein the tower crane motion parameters comprise at least one of radial distance of a tower crane trolley, height of a lifting hook and rotation angle of a large arm.
3. The method for three-dimensional reconstruction of a scene under a tower crane based on a monocular camera according to claim 1, wherein the alignment of the feature information and the semantic information in the results of the plurality of divided examples is maintained to assist in the calculation of the constrained relative pose during the calculation of step S20.
4. The three-dimensional reconstruction method of the scene under the tower crane based on the monocular camera according to claim 3, wherein the R and the T are configured to be used for solving the relative pose R and the relative pose T without two-dimensional plane assumption therebetween, and the three-dimensional position and the rotation are solved according to the internal azimuth parameters of the monocular camera to obtain the relative pose of the monocular camera.
5. The method for three-dimensional reconstruction of a scene under a tower crane based on a monocular camera according to claim 4, wherein in step S30, a modified beam method adjustment algorithm is adopted, and semantic segmentation loss is added into an objective function, wherein the semantic segmentation loss is a euclidean distance of a static target semantic boundary and a euclidean distance of a dynamic target semantic boundary, and the semantic segmentation loss is represented by the following formula:
g(c,x,instance)=∑||q-p(c,x)||+λ∑||ins-ins′||
in the above formula, sigma||q-p (c, x) |is a standard beam method adjustment formula to calculate the re-projection error of the key point; sigma||ins-ins' |is the reprojection error of the static target semantic boundary point between adjacent frames; lambda is a penalty factor.
6. The method for three-dimensional reconstruction of a scene under a tower crane based on a monocular camera according to claim 5, wherein the Euclidean distance of the semantic boundary of the dynamic target is smaller than a dynamic boundary threshold value, and the penalty factor lambda is 0.2.
7. The three-dimensional reconstruction method of a scene under a tower crane based on a monocular camera according to claim 5, wherein in step S40, the method comprises the steps of:
acquiring homonymous lines according to the relative pose after error optimization and the basic principle of measuring epipolar geometry;
matching a first processing window for each pixel point on the homonym line, introducing semantic information into the matching relation, matching the first processing window corresponding to the semantic information on the homonym line, and searching the optimal homonym point in the first processing window by taking the normalized gray scale correlation coefficient as the similarity;
and (5) performing dense reconstruction to obtain a scene reconstruction result below the tower crane.
8. The method for three-dimensional reconstruction of a scene under a tower crane based on a monocular camera according to claim 7, wherein in step S40, semantic information with plane characteristics is identified, boundary matching of a division example result is performed, plane hypothesis processing is performed on the inside of the division example result, and homonymy point matching is not performed.
9. A system for applying to the monocular camera-based three-dimensional reconstruction method of a scene under a tower crane as claimed in any one of claims 1 to 8, comprising:
the monocular camera is arranged below the tower crane trolley and is used for acquiring a lower image;
the semantic segmentation module is used for acquiring target images of adjacent frames on a time axis, and performing target semantic segmentation based on deep learning to obtain a plurality of segmentation example results, wherein the segmentation example results comprise characteristic information and semantic information;
the relative pose solving module is used for adopting a multi-stage solving mode to firstly make a two-dimensional plane assumption on the relative motion of the monocular camera according to the tower crane motion parameters and calculate a rotation and translation matrix R between two target images 0 And T 0 Calculating relative positions R and T between two frames of target images by utilizing characteristic information matching to obtain the relative pose of the monocular camera;
the error optimization processing module is used for carrying out error redistribution on the inter-frame relative pose results of the long-time sequence by combining semantic segmentation loss, homogenizing the accumulated errors to each frame, and obtaining the relative pose after error optimization;
and the three-dimensional reconstruction module is used for carrying out dense reconstruction according to the relative pose and semantic information after error optimization to obtain a scene reconstruction result below the tower crane.
CN202310148891.4A 2023-02-14 2023-02-14 Three-dimensional reconstruction method and system for lower scene of tower crane based on monocular camera Active CN116433856B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310148891.4A CN116433856B (en) 2023-02-14 2023-02-14 Three-dimensional reconstruction method and system for lower scene of tower crane based on monocular camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310148891.4A CN116433856B (en) 2023-02-14 2023-02-14 Three-dimensional reconstruction method and system for lower scene of tower crane based on monocular camera

Publications (2)

Publication Number Publication Date
CN116433856A CN116433856A (en) 2023-07-14
CN116433856B true CN116433856B (en) 2023-12-05

Family

ID=87080447

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310148891.4A Active CN116433856B (en) 2023-02-14 2023-02-14 Three-dimensional reconstruction method and system for lower scene of tower crane based on monocular camera

Country Status (1)

Country Link
CN (1) CN116433856B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416840A (en) * 2018-03-14 2018-08-17 大连理工大学 A kind of dense method for reconstructing of three-dimensional scenic based on monocular camera
CN110827305A (en) * 2019-10-30 2020-02-21 中山大学 Semantic segmentation and visual SLAM tight coupling method oriented to dynamic environment
CN111968129A (en) * 2020-07-15 2020-11-20 上海交通大学 Instant positioning and map construction system and method with semantic perception
CN112509115A (en) * 2020-11-26 2021-03-16 中国人民解放军战略支援部队信息工程大学 Three-dimensional time-varying unconstrained reconstruction method and system for dynamic scene of sequence image
CN113674416A (en) * 2021-08-26 2021-11-19 中国电子科技集团公司信息科学研究院 Three-dimensional map construction method and device, electronic equipment and storage medium
WO2022179690A1 (en) * 2021-02-25 2022-09-01 Telefonaktiebolaget Lm Ericsson (Publ) Map processing device and method thereof
CN115661453A (en) * 2022-10-25 2023-01-31 腾晖科技建筑智能(深圳)有限公司 Tower crane hanging object detection and segmentation method and system based on downward viewing angle camera

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11748913B2 (en) * 2021-03-01 2023-09-05 Qualcomm Incorporated Modeling objects from monocular camera outputs

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416840A (en) * 2018-03-14 2018-08-17 大连理工大学 A kind of dense method for reconstructing of three-dimensional scenic based on monocular camera
CN110827305A (en) * 2019-10-30 2020-02-21 中山大学 Semantic segmentation and visual SLAM tight coupling method oriented to dynamic environment
CN111968129A (en) * 2020-07-15 2020-11-20 上海交通大学 Instant positioning and map construction system and method with semantic perception
CN112509115A (en) * 2020-11-26 2021-03-16 中国人民解放军战略支援部队信息工程大学 Three-dimensional time-varying unconstrained reconstruction method and system for dynamic scene of sequence image
WO2022179690A1 (en) * 2021-02-25 2022-09-01 Telefonaktiebolaget Lm Ericsson (Publ) Map processing device and method thereof
CN113674416A (en) * 2021-08-26 2021-11-19 中国电子科技集团公司信息科学研究院 Three-dimensional map construction method and device, electronic equipment and storage medium
CN115661453A (en) * 2022-10-25 2023-01-31 腾晖科技建筑智能(深圳)有限公司 Tower crane hanging object detection and segmentation method and system based on downward viewing angle camera

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《一种基于SLAM 的无人机影像快速三维重建方法》;宋志勇等;《科技创新与应用》;全文 *
基于单目多视角影像的场景三维重建;吴铮铮;寇展;;光学与光电技术(第05期);全文 *

Also Published As

Publication number Publication date
CN116433856A (en) 2023-07-14

Similar Documents

Publication Publication Date Title
CN111563442B (en) Slam method and system for fusing point cloud and camera image data based on laser radar
CN111862126B (en) Non-cooperative target relative pose estimation method combining deep learning and geometric algorithm
Yang et al. Concrete defects inspection and 3D mapping using CityFlyer quadrotor robot
CN111369495B (en) Panoramic image change detection method based on video
Dal Poz et al. Automated extraction of road network from medium-and high-resolution images
CN113313047B (en) Lane line detection method and system based on lane structure prior
CN111079604A (en) Method for quickly detecting tiny target facing large-scale remote sensing image
CN101957991A (en) Remote sensing image registration method
CN113888461A (en) Method, system and equipment for detecting defects of hardware parts based on deep learning
CN111105452A (en) High-low resolution fusion stereo matching method based on binocular vision
Gao et al. A general deep learning based framework for 3D reconstruction from multi-view stereo satellite images
CN116228792A (en) Medical image segmentation method, system and electronic device
CN114742820A (en) Bolt looseness detection method and system based on deep learning and storage medium
CN112163995A (en) Splicing generation method and device for oversized aerial photographing strip images
CN113593035A (en) Motion control decision generation method and device, electronic equipment and storage medium
CN112652020A (en) Visual SLAM method based on AdaLAM algorithm
Liu et al. UAV image mosaic for road traffic accident scene
CN116012817A (en) Real-time panoramic parking space detection method and device based on double-network deep learning
CN111798453A (en) Point cloud registration method and system for unmanned auxiliary positioning
CN115222884A (en) Space object analysis and modeling optimization method based on artificial intelligence
Feng et al. Crack assessment using multi-sensor fusion simultaneous localization and mapping (SLAM) and image super-resolution for bridge inspection
US20220155441A1 (en) Lidar localization using optical flow
Xu et al. Fast and accurate registration of large scene vehicle-borne laser point clouds based on road marking information
CN117570968A (en) Map construction and maintenance method and device based on visual road sign and storage medium
CN116433856B (en) Three-dimensional reconstruction method and system for lower scene of tower crane based on monocular camera

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant