CN115019208A - Road surface three-dimensional reconstruction method and system for dynamic traffic scene - Google Patents

Road surface three-dimensional reconstruction method and system for dynamic traffic scene Download PDF

Info

Publication number
CN115019208A
CN115019208A CN202210676065.2A CN202210676065A CN115019208A CN 115019208 A CN115019208 A CN 115019208A CN 202210676065 A CN202210676065 A CN 202210676065A CN 115019208 A CN115019208 A CN 115019208A
Authority
CN
China
Prior art keywords
image
images
vehicle
road surface
area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210676065.2A
Other languages
Chinese (zh)
Inventor
杨旭
管进超
李毅
洪翰
丁玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changan University
Original Assignee
Changan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changan University filed Critical Changan University
Priority to CN202210676065.2A priority Critical patent/CN115019208A/en
Publication of CN115019208A publication Critical patent/CN115019208A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Remote Sensing (AREA)
  • Computer Graphics (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a road surface three-dimensional reconstruction method and a road surface three-dimensional reconstruction system facing to a dynamic traffic scene, which are used for realizing the road surface three-dimensional reconstruction under the influence of dynamic traffic based on unmanned aerial vehicle stereo photography and deep learning, removing vehicle noise on aerial images by adopting a lightweight deep learning framework, and automatically adjusting the image overlapping rate in different traffic density areas through image sequence space optimization so as to improve the quality and speed of road surface three-dimensional modeling.

Description

Road surface three-dimensional reconstruction method and system for dynamic traffic scene
Technical Field
The invention relates to the field of pavement health monitoring, in particular to a pavement three-dimensional reconstruction method and a pavement three-dimensional reconstruction system for a dynamic traffic scene.
Background
The road surface condition has great influence on the safety, comfort and economy of traffic transportation, and the road surface health monitoring is the basis for ensuring the normal operation of roads. Most of the current automatic detection of pavement diseases is based on two-dimensional images, more accurate disease identification can be realized based on a three-dimensional model of the pavement, and meanwhile, three-dimensional size information of various diseases can be obtained. With the development of three-dimensional measurement technology, the collection of road surface three-dimensional data has become a hot spot of current research.
At present, the three-dimensional measurement technology of the pavement can be mainly divided into a laser imaging method and a stereoscopic vision method. At present, most of three-dimensional measuring devices matched with multifunctional road surface detection vehicles are based on a laser imaging technology, and the imaging mode mainly adopts a ToF (time of flight) principle or a structured light principle. Although laser three-dimensional imaging can generate a pavement model quickly, the pavement model is susceptible to strong light and vibration, and the equipment cost is high.
Therefore, stereoscopic three-dimensional imaging based on color images becomes an alternative method for low-cost and high-precision pavement three-dimensional reconstruction. Stereoscopic vision three-dimensional reconstruction can be mainly divided into binocular stereoscopic imaging and monocular stereoscopic imaging according to the number of cameras. Binocular stereo imaging is the restoration of scene depth by parallax produced between two cameras that are fixed in position. The binocular stereo imaging has the advantages of high imaging speed, but has the disadvantages of poor modeling resolution, and the inaccuracy of three-dimensional reconstruction is caused by the deviation of the installation position of the camera.
Monocular stereoscopic imaging may generate a three-dimensional model from images taken by a single camera during movement. In the field of three-dimensional reconstruction of traffic infrastructure, monocular stereo imaging mainly adopts unmanned aerial vehicle photography to realize scene three-dimensional reconstruction. And generating a road surface three-dimensional point cloud model through image feature matching and epipolar geometry based on the multi-view road surface image. Although the monocular stereo imaging of the unmanned aerial vehicle can realize the three-dimensional reconstruction of a large-scale road scene, the monocular stereo imaging of the unmanned aerial vehicle still has the following two difficulties in application: (1) the unmanned aerial vehicle stereoscopic photography can be implemented only on a closed road, and vehicles running on an open road can block the road surface and generate a large amount of noise in the modeling process; (2) in order to observe a road surface area blocked by a vehicle, only the image acquisition density can be greatly increased, resulting in a sudden increase in the modeling processing time.
In order to realize automatic recognition of vehicles on aerial images, current research has attempted to perform feature recognition, such as buildings like roads, bridges, etc., using image processing methods or deep learning. However, the calculation amount of the existing target identification method is large, and the size change of objects on the unmanned aerial vehicle images with different heights can have great influence on the identification precision.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a road surface three-dimensional reconstruction method and a road surface three-dimensional reconstruction system for a dynamic traffic scene, so as to solve the problems of reconstruction model loss and a large amount of surface noise caused by the shielding influence of a moving vehicle in a road operation environment and the problem of low road surface three-dimensional reconstruction efficiency in a large visual field in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:
a road surface three-dimensional reconstruction method for a dynamic traffic scene comprises the following steps:
step 1, collecting an image shot by an unmanned aerial vehicle;
step 2, inputting the shot images serving as prediction set images into an improved YOLO vehicle detector to obtain coordinate files of the vehicles in the shot prediction set images; using the coordinate file of the vehicle as a reference, and univaluating the area in the vehicle boundary frame in the prediction set image;
step 3, obtaining the spatial relationship between adjacent images in the images of the prediction set by a feature extraction method, calculating the actual shielding rate and the ground overlapping rate between the adjacent images by the spatial relationship between the adjacent images and the vehicle boundary frame on each image, identifying invalid images by combining image sequence space optimization, removing the invalid images and obtaining preprocessed images;
and reconstructing the three-dimensional topography of the road surface through stereoscopic vision by combining the preprocessed image and the coordinate file of the image.
The invention is further improved in that:
preferably, before step 1, the boundary coordinates of the planned shooting road surface area are extracted from the electronic map, a flight area file is generated, the flight area file is imported into the unmanned aerial vehicle, and the unmanned aerial vehicle shoots the planned shooting road surface area.
Preferably, the process of obtaining the coordinate file of the vehicle in the image of the shot prediction set by improving the YOLO vehicle detector in step 2 is as follows:
(1) inputting the shot image into an improved YOLO vehicle detector in an initial (512, 3) size, preliminarily predicting a vehicle boundary frame, and counting the average pixel width of the vehicle;
(2) and calculating a scaling factor k according to the ratio of the average pixel width of the vehicles on the images of the training set to the average pixel width of the vehicles on the images of the prediction set.
(3) And (5) adjusting the input size of the prediction set image to (512 xk, 3) through a scaling factor, inputting the input size to the deep learning framework again, and predicting the final accurate vehicle boundary box.
Preferably, the improved YOLO vehicle detector is obtained by the following process:
respectively sampling, splicing and feature extracting images extracted from input images by a layer 6, a layer 12 and a layer 14 in a backbone feature extraction network, and then predicting the type, the position and the confidence coefficient of a boundary frame on the basis of 9 predefined anchor point frames through standard convolution to obtain a vehicle boundary frame angle point coordinate file data set of a predicted image;
comparing the marked image vehicle boundary frame angle point coordinate file data set with the predicted image vehicle boundary frame angle point coordinate file data set to obtain an error, training the YOLO vehicle detector until the error is smaller than a set value, and obtaining an improved YOLO vehicle detector;
the backbone feature extraction network comprises 1 CBL module and 13 DBR modules; the CBL module comprises a standard convolution layer, 1 batch normalization layer and a Leaky ReLU activation layer; the DBR module includes a depth separable convolution layer, a bulk normalization layer, and a ReLU active layer.
Preferably, in step 3, the spatial relationship between the adjacent images is calculated by the following formula:
Figure BDA0003696613610000041
wherein the content of the first and second substances,
Figure BDA0003696613610000042
the method comprises the following steps of (1) taking characteristic points on an image, wherein R is a rotation matrix, and T is a translation matrix; and calculating actual values of the rotation matrix R and the translation matrix T through a plurality of characteristic point pairs to obtain a spatial relationship between adjacent images.
Preferably, in step 3, the process of identifying the invalid image is:
(1) let the overlap ratio of the original image sequence be IOR max Then the lowest image overlap rate after optimization is IOR min =1-2*(1-IOR max );
(2) Searching at 2-step intervals from the 1 st image, and calculating the occlusion rate OCR obtained by combining the ith image and the (i + 2) th image 1 I.e. the proportion of the overlapping area of the two image vehicles to the overlapping area of the images. Meanwhile, the actual ground overlap ratio GOR of the area outside the shielding position of the vehicle can be obtained 1
(3) Calculating the occlusion rate OCR obtained by combining the ith image, the (i + 1) th image and the (i + 2) th image 2 That is, the ratio of the vehicle overlapping area of the three images to the image overlapping area, and the actual ground overlapping ratio GOR of the area other than the vehicle shielding position can be obtained 2
(4) If OCR 2 >OCR 1 And GOR 2 >GOR 1 And if not, deleting the (i + 1) th image.
Preferably, the calculation formula of the occlusion rate OCR is as follows:
the occlusion rate OCR is the area of the part which is still occluded by the vehicle/the area of the original image after the two images are overlapped;
the calculation formula of the ground overlap ratio GOR is as follows:
the ground overlap ratio GOR is the area of the vehicle shielding part/the original image area after the two images are overlapped.
Preferably, in step 4, the coordinates of the preprocessed image are matched with the coordinates of the image in step 1, and the relative spatial position of the camera between different pictures is preliminarily estimated; extracting characteristic points on each image and matching the characteristic points of adjacent images; reversely calculating the space coordinate of the camera according to the characteristic point pairs and the space epipolar geometry; and resolving the spatial three-dimensional coordinates of all the characteristic points on the image according to the camera spatial coordinates to obtain a point cloud model, and further reconstructing the three-dimensional topography of the pavement through stereoscopic vision.
A dynamic traffic scene oriented three-dimensional reconstruction of a roadway comprising:
the image acquisition module is used for acquiring images shot by the unmanned aerial vehicle;
the coordinate acquisition module is used for inputting the shot images serving as prediction set images into the improved YOLO vehicle detector to obtain coordinate files of the vehicles in the shot prediction set images; using the coordinate file of the vehicle as a reference, and univaluating the area in the vehicle boundary frame in the prediction set image;
the image sequence optimization module is used for obtaining the spatial relationship between adjacent images in the prediction set images through a feature extraction method, calculating the actual shielding rate and the ground overlapping rate between the adjacent images through the spatial relationship between the adjacent images and a vehicle boundary frame on each image, identifying invalid images through combining image sequence spatial optimization, removing the invalid images and obtaining preprocessed images;
and the reconstruction module is used for reconstructing the three-dimensional topography of the road surface through stereoscopic vision by combining the preprocessed image and the coordinate file of the image.
Compared with the prior art, the invention has the following beneficial effects:
the invention discloses a road surface three-dimensional reconstruction method and a road surface three-dimensional reconstruction system facing to a dynamic traffic scene, which are used for realizing the road surface three-dimensional reconstruction under the influence of dynamic traffic based on unmanned aerial vehicle stereo photography and deep learning, removing vehicle noise on aerial images by adopting a lightweight deep learning framework, and automatically adjusting the image overlapping rate in different traffic density areas through image sequence space optimization so as to improve the quality and speed of road surface three-dimensional modeling. The invention has the advantages that:
(1) the image three-dimensional reconstruction frame can be used for three-dimensional reconstruction of a road surface under an open road environment and is not influenced by running vehicles; (2) the vehicle noise on the aerial image can be quickly identified and eliminated, and the vehicle size change on the unmanned aerial vehicle images with different heights cannot influence the identification precision; (3) the number of required images under different traffic flow densities can be identified, the spatial distribution of aerial image sequences is optimized, and invalid images are reduced; (4) the integrity and the spatial precision of the reconstructed road surface point cloud model can be ensured, and the three-dimensional modeling speed of a large-scale scene is improved.
Drawings
FIG. 1 is an overall frame diagram of the present invention;
FIG. 2 is a frame diagram of unmanned aerial vehicle image vehicle recognition deep learning;
FIG. 3 is a diagram of unmanned aerial vehicle image vehicle identification and noise cancellation effects;
FIG. 4 is an unmanned aerial vehicle image space relationship estimation;
FIG. 5 is a schematic diagram illustrating the calculation of the vehicle shielding rate and the ground overlapping rate;
FIG. 6 is a frame diagram for unmanned aerial vehicle image sequence spatial optimization;
FIG. 7 is a spatial distribution diagram before and after optimization of an image sequence of an unmanned aerial vehicle;
wherein, (a) is a road section 1 original image sequence; (b) a sequence of spatially optimized images for road segment 1; (c) is a road section 2 original image sequence; (d) a sequence of spatially optimized images for road segment 2;
FIG. 8 is a three-dimensional road surface reconstruction effect diagram under the influence of traffic according to the present invention;
wherein (a) is a reconstructed three-dimensional model of road segment 1; (b) a reconstructed three-dimensional model for road segment 2;
FIG. 9 is a comparison of modeled velocities for the present invention.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings:
in the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplification of description, but do not indicate or imply that the device or element referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention; the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance; furthermore, unless expressly stated or limited otherwise, the terms "mounted," "connected," and "coupled" are to be construed broadly and encompass, for example, both fixed and removable coupling arrangements; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Referring to fig. 1, the invention provides a road surface rapid three-dimensional reconstruction method based on unmanned aerial vehicle stereo photography and deep learning, which specifically comprises the following steps:
step 1, automatically acquiring road images on an open road through a multi-rotor unmanned aerial vehicle, and recording space coordinates when the images are shot.
Step 1.1, extracting boundary coordinates of a planned shooting road surface area in an electronic map, and generating a flight area file in a kml format, wherein the boundary coordinates of the preselected aerial survey road surface area are in a closed loop form, namely { (x) 1 ,y 1 ),(x 2 ,y 2 ),…,(x n ,y n ),(x 1 ,y 1 )}。
And step 1.2, importing the generated flight area file into the unmanned aerial vehicle, and setting shooting parameters, namely a camera inclination angle, a camera resolution, a shooting height, a longitudinal overlapping rate and a transverse overlapping rate. Specifically, the photographing parameters should be set according to different traffic environments and modeling accuracy requirements. On a road with open traffic, the flying height of the unmanned aerial vehicle is not less than 10 meters, the inclination angle of the camera is between 60 and 90 degrees, the longitudinal overlapping rate of adjacent shot images is not less than 90 percent, and the transverse overlapping rate of the images is not less than 60 percent.
And step 1.3, executing the aerial photography task, acquiring aerial photography images of different road positions, recording position information during image shooting, and generating an xml coordinate file matched with the image name.
Through step 1, a kml format flight area file of the planned shot road surface area and an xml coordinate file of the shot image are obtained.
And 2, constructing an improved YOLO (you Only Look one) deep learning frame to detect the position of the vehicle, and removing the area containing the vehicle.
The part is divided into a training stage in the previous stage and a using stage in the actual application process.
(I) training phase
And 2.1, selecting part of aerial images as training set images for marking, marking the positions of the vehicles on the images through the boundary frames, and generating a boundary frame corner point coordinate file, wherein the specific vehicle types do not need to be distinguished in the image marking process, and the vehicle boundary frames only need to be classified into one type and are marked images.
Step 2.2, with reference to fig. 2, an improved YOLO deep learning framework, which is a YOLO vehicle detector, is constructed, and the improved YOLO deep learning framework is a depth separable convolution (depth separable convolution) and image resolution active adjustment unit is fused, and is used for positioning various vehicle positions on the aerial image by using a bounding box. The embedded depth separable convolution of the improved YOLO vehicle detector comprises two parts of channel-by-channel convolution and point-by-point convolution. The purpose of improving the image resolution active adjustment unit embedded by the YOLO vehicle detector is to enhance the generalization performance of the model. The standard convolution in the original YOLO deep learning framework is replaced by the deep separable convolution so as to reduce the calculated amount and improve the characteristic extraction performance.
And meanwhile, an image resolution active adjusting unit is adopted to adjust the resolution of the network input image according to the size of the vehicle on the images with different aerial photographing heights. The overall framework is as follows: the input to the improved YOLO network is a three-channel color image, and the input size of the image in the training phase is (512, 3), since the original image needs to be converted into an input image of (512 × k,3) size, where k is a scaling factor and 512 × k needs to be a multiple of 32. The pixel size of the vehicle on the image is preliminarily predicted through an image resolution active adjusting unit, and an appropriate scaling coefficient k is selected according to the pixel size to generate an input image with the optimal resolution.
And 2.3, inputting the marked image data set with the size of (512, 3) into the model established in the step 2.2, and training the constructed YOLO vehicle detector.
The process of predicting the image boundary box corner point coordinate file by the YOLO vehicle detector is to firstly extract the characteristics of the input image adjusted with resolution through a backbone network, wherein the backbone network comprises a basic component CBL module and a DBR module. The CBL module includes 1 standard convolutional layer (C), 1 bulk normalization layer (B) and 1 leakage ReLU active layer (L). The DBR module consists of 1 depth separable convolutional layer (D), 1 bulk normalization layer (B) and 1 ReLU active layer (R). The backbone feature extraction network consists of 1 CBL module and 13 DBR modules, and the tail ends of the CBL modules and the DBR modules are subjected to feature fusion through a space pyramid pool. The image features extracted from the 6 th layer, the 12 th layer and the 14 th layer (tail end) in the backbone feature extraction network are respectively subjected to up-sampling, splicing and feature extraction. Finally, predicting the type, position and confidence of the boundary frame on the basis of 9 predefined anchor point frames through standard convolution to obtain a boundary frame corner point coordinate file of each image and form a vehicle boundary frame corner point coordinate file data set of each image.
And estimating the error between the predicted image and the annotated image, and defining model loss by using the CIoU until the training loss is stable and the error of the predicted value is less than a preset value. The network training process can adopt the training technologies of transfer learning, training freezing, dynamic learning rate, early stopping and the like.
(II) stage of use
Step 2.4, the trained improved YOLO vehicle detector is used for identifying and positioning various vehicles on different aerial images, so that the positions of other bounding boxes are predicted with high precision, an image resolution active adjustment unit is applied, and other data sets are predicted by using the detector, and the method mainly comprises the following steps:
(1) inputting the prediction set image into an improved YOLO vehicle detector in an initial (512, 3) size, preliminarily predicting a vehicle boundary box, and counting the average pixel width of the vehicle;
(2) and calculating a scaling factor k according to the ratio of the average pixel width of the vehicles on the images of the training set to the average pixel width of the vehicles on the images of the prediction set.
(3) And (5) adjusting the input size of the prediction set image to be (512 x k,3), inputting the improved YOLO vehicle detector again, and predicting the final accurate vehicle boundary box, namely the coordinate file of the prediction set image.
And resetting the RGB values of the pixels in the predicted boundary frame to (255,255 and 255) so as to eliminate the vehicle noise on the image.
FIG. 3 illustrates vehicle identification results of the improved YOLO deep learning framework on various aerial image datasets. After the position of the bounding box is predicted with high precision, the pixels of the image inside the bounding box are reset to (255 ), i.e. the region containing the vehicle is rendered single-valued.
And step 3, re-matching the image positions, calculating the ground overlapping rate and the shielding rate among different images, dynamically optimizing the spatial distribution of the images and reducing the number of aerial photos.
And 3.1, extracting the feature points on all the images through a feature extraction algorithm, and matching the feature points on the adjacent images according to the similarity of the feature points to obtain the actual overlapping area and the relative position relation of the adjacent images. The image Feature extraction algorithm may adopt sift (scale artifact Feature transform) algorithm, surf (speedup Robust Feature) algorithm, fast (features From accessed Segment test) algorithm, and the like. The position relation of adjacent images can be solved through a plurality of known point pairs to obtain a rotation matrix and a translation matrix.
Referring to fig. 4, in order to estimate the actual spatial relationship between the aerial images, feature points on all the images are extracted by a feature extraction algorithm
Figure BDA0003696613610000111
And matching features on adjacent images according to similarityFeature points, generating pairs of feature points
Figure BDA0003696613610000112
The actual overlapping area and relative positional relationship of adjacent images can be described by a rotation matrix R and a translation matrix T:
Figure BDA0003696613610000113
and solving the actual values of the rotation matrix R and the translation matrix T through a plurality of characteristic point pairs to obtain the spatial relationship between the images.
And 3.2, referring to fig. 5, calculating the actual occlusion rate OCR and the ground overlap ratio GOR between the images according to the spatial positions of the adjacent images and the vehicle occlusion positions on the images. The vehicle occlusion rate is a ratio of a vehicle overlapping area to an image overlapping area on a plurality of images, and the ground overlap rate is an overlapping ratio of an area other than a vehicle occlusion position. The overlapping rate of the original image sequence is not less than 90%, and the overlapping rate of the optimized images is between 80% and 90%. The ground overlapping rate should be more than 70% to ensure the alignment precision when the road surface is reconstructed in three dimensions.
In particular, the method comprises the following steps of,
the occlusion rate OCR is the area of the portion of the two images that is still occluded by the vehicle/the area of the original image.
The ground overlap ratio GOR is the area of the vehicle shielding part/the original image area after the two images are overlapped.
And 3.3, identifying invalid photos through image sequence space optimization according to the actual shielding rate and the ground overlapping rate between the images, and reducing the number of images used for three-dimensional reconstruction.
Referring to fig. 6, based on the actual occlusion rate OCR between the images and the ground overlap ratio GOR, the spatial distribution of the image sequence is optimized by,
(1) let the overlap ratio of the original image sequence be IOR max Then the lowest image overlap rate after optimization is IOR min =1-2*(1-IOR max );
(2) Searching at 2-step intervals from the 1 st image, and calculating the occlusion rate OCR obtained by combining the ith image and the (i + 2) th image 1 I.e. the proportion of the overlapping area of the two image vehicles to the overlapping area of the images. Meanwhile, the actual ground overlap ratio GOR of the area outside the shielding position of the vehicle can be obtained 1
(3) Calculating the occlusion rate OCR obtained by combining the ith image, the (i + 1) th image and the (i + 2) th image 2 That is, the ratio of the vehicle overlapping area of the three images to the image overlapping area, and the actual ground overlapping ratio GOR of the area other than the vehicle shielding position can be obtained 2
(4) If OCR 2 >OCR 1 And GOR 2 >GOR 1 And if not, deleting the (i + 1) th image.
Referring to fig. 7, the image distribution of the two road segments before and after aerial photography image sequence space optimization is shown, and after optimization, the number of images participating in modeling can be greatly reduced, the image density in a small traffic flow area is effectively reduced, and redundant images are deleted.
And 4, reconstructing the road surface three-dimensional morphology through stereoscopic vision based on the preprocessed aerial images.
Step 4.1, the preprocessed image generated in the step 3 is matched with the image coordinates extracted in the step 1 again, and the relative space position of the camera is preliminarily estimated; extracting the characteristic points on each picture and matching the characteristic points of the adjacent images; and inversely calculating the space coordinates of the camera according to the characteristic point pairs and the space epipolar geometry.
And 4.2, resolving the space three-dimensional coordinates of all the characteristic points on the image according to the obtained camera space coordinates to obtain a point cloud model.
As one preferable scheme, the aerial image preprocessing operation in steps 2 and 3 may be performed in synchronization with the image three-dimensional reconstruction operation in step 4, and the processing may be performed in batches.
Based on the preprocessed image sequence, the exact position of the camera is back-calculated by stereo vision, and the spatial coordinates of the feature points on the image are estimated, and the generated point cloud model is shown in fig. 8. Meanwhile, the pavement point cloud model generated by the method is subjected to space precision verification.
The invention also discloses a road surface three-dimensional reconstruction facing the dynamic traffic scene, which comprises the following steps:
the image acquisition module is used for acquiring images shot by the unmanned aerial vehicle;
the coordinate acquisition module is used for inputting the shot images serving as prediction set images into the improved YOLO vehicle detector to obtain coordinate files of the vehicles in the shot prediction set images; using the coordinate file of the vehicle as a reference, and univaluating the area in the vehicle boundary frame in the prediction set image;
the image sequence optimization module is used for obtaining the spatial relationship between adjacent images in the images of the prediction set through a feature extraction method, calculating the actual shielding rate and the ground overlapping rate between the adjacent images through the spatial relationship between the adjacent images and a vehicle boundary frame on each image, identifying invalid images through combining image sequence spatial optimization, removing the invalid images and obtaining preprocessed images;
and the reconstruction module is used for reconstructing the three-dimensional topography of the road surface through stereoscopic vision by combining the preprocessed image and the coordinate file of the image.
Table 1 below shows the comparison of the digital model measurements with the manual measurements. The average relative errors of the three-dimensionally reconstructed 7-meter, 12-meter and 20-meter models of the embodiment are respectively 0.23%, 0.23% and 0.29%, and it is found that the road surface point cloud models generated under different camera heights and traffic volumes have higher-level spatial accuracy.
TABLE 1 digital model measurement accuracy
Figure BDA0003696613610000131
Figure BDA0003696613610000141
Referring to fig. 9, the processing time of the modeling method of the present invention and the conventional modeling method on the three-dimensional reconstruction of the road surface is compared. Compared with the traditional modeling method, the modeling time average of the modeling method can be reduced by 38.28%. According to the traffic flow conditions of different road sections, the optimization efficiency of the method is between 30 and 50 percent.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (9)

1. A road surface three-dimensional reconstruction method for a dynamic traffic scene is characterized by comprising the following steps:
step 1, collecting an image shot by an unmanned aerial vehicle;
step 2, inputting the shot images serving as prediction set images into an improved YOLO vehicle detector to obtain coordinate files of the vehicles in the shot prediction set images; using the coordinate file of the vehicle as a reference, and univaluating the area in the vehicle boundary frame in the prediction set image;
step 3, obtaining the spatial relationship between adjacent images in the images of the prediction set by a feature extraction method, calculating the actual shielding rate and the ground overlapping rate between the adjacent images by the spatial relationship between the adjacent images and the vehicle boundary frame on each image, identifying invalid images by combining image sequence space optimization, removing the invalid images and obtaining preprocessed images;
and reconstructing the three-dimensional topography of the road surface through stereoscopic vision by combining the preprocessed image and the coordinate file of the image.
2. The method for three-dimensional reconstruction of the road surface facing the dynamic traffic scene according to claim 1, characterized in that, before step 1, the boundary coordinates of the planned shot road surface area are extracted from the electronic map, a flight area file is generated, the flight area file is imported into the unmanned aerial vehicle, and the unmanned aerial vehicle shoots the planned shot road surface area.
3. The method for three-dimensional reconstruction of a road surface facing to a dynamic traffic scene as claimed in claim 1, wherein the step 2 of obtaining the coordinate file of the vehicle in the image of the shooting prediction set by improving the YOLO vehicle detector comprises the following steps:
(1) inputting the shot image into an improved YOLO vehicle detector in an initial (512, 3) size, preliminarily predicting a vehicle boundary frame, and counting the average pixel width of the vehicle;
(2) calculating a scaling factor k according to the ratio of the average pixel width of the vehicles on the images of the training set to the average pixel width of the vehicles on the images of the prediction set;
(3) and adjusting the input size of the prediction set image to be (512 x k,3) through the scaling factor, inputting the input size to the deep learning framework again, and predicting the final accurate vehicle boundary box.
4. The method for reconstructing the three-dimensional road surface facing the dynamic traffic scene as claimed in claim 1, wherein the obtaining process of the improved YOLO vehicle detector is as follows:
respectively sampling, splicing and feature extracting images extracted from input images by a layer 6, a layer 12 and a layer 14 in a backbone feature extraction network, and then predicting the type, the position and the confidence coefficient of a boundary frame on the basis of 9 predefined anchor point frames through standard convolution to obtain a vehicle boundary frame angle point coordinate file data set of a predicted image;
comparing the marked image vehicle boundary frame angle point coordinate file data set with the predicted image vehicle boundary frame angle point coordinate file data set to obtain an error, training the YOLO vehicle detector until the error is smaller than a set value, and obtaining an improved YOLO vehicle detector;
the backbone feature extraction network comprises 1 CBL module and 13 DBR modules; the CBL module comprises a standard convolution layer, 1 batch normalization layer and a Leaky ReLU activation layer; the DBR module includes a depth separable convolution layer, a bulk normalization layer, and a ReLU active layer.
5. The method for three-dimensional reconstruction of road surface facing to dynamic traffic scene as claimed in claim 1, wherein in step 3, the spatial relationship between the adjacent images is calculated by the following formula:
Figure FDA0003696613600000021
wherein the content of the first and second substances,
Figure FDA0003696613600000022
the method comprises the following steps of (1) taking characteristic points on an image, wherein R is a rotation matrix, and T is a translation matrix; and calculating actual values of the rotation matrix R and the translation matrix T through a plurality of characteristic point pairs to obtain a spatial relationship between adjacent images.
6. The method for three-dimensional reconstruction of a road surface facing a dynamic traffic scene as claimed in claim 1, wherein in step 3, the process of identifying the invalid image is as follows:
(1) let the overlap ratio of the original image sequence be IOR max Then the lowest image overlap rate after optimization is IOR min =1-2*(1-IOR max );
(2) Searching at 2-step intervals from the 1 st image, and calculating the occlusion rate OCR obtained by combining the ith image and the (i + 2) th image 1 The proportion of the overlapping area of the two image vehicles in the overlapping area of the images is; meanwhile, the actual ground overlap ratio GOR of the area outside the shielding position of the vehicle can be obtained 1
(3) Calculating the occlusion rate OCR obtained by combining the ith image, the (i + 1) th image and the (i + 2) th image 2 That is, the ratio of the vehicle overlapping area of the three images to the image overlapping area, and the actual ground overlapping ratio GOR of the area other than the vehicle shielding position can be obtained 2
(4) If OCR 2 >OCR 1 And GOR 2 >GOR 1 And if not, deleting the (i + 1) th image.
7. The method for three-dimensional reconstruction of a road surface facing a dynamic traffic scene as claimed in claim 6, wherein the calculation formula of the occlusion rate OCR is as follows:
the occlusion rate OCR is the area of the part which is still occluded by the vehicle/the area of the original image after the two images are overlapped;
the calculation formula of the ground overlap ratio GOR is as follows:
the ground overlap ratio GOR is the area of the vehicle shielding part/the original image area after the two images are overlapped.
8. The method for three-dimensional reconstruction of a road surface facing to a dynamic traffic scene as claimed in claim 1, wherein in step 4, coordinates of the preprocessed image and the image in step 1 are matched, and relative spatial positions of the cameras between different pictures are preliminarily estimated; extracting characteristic points on each image and matching the characteristic points of adjacent images; reversely calculating the space coordinate of the camera according to the characteristic point pairs and the space epipolar geometry; and resolving the spatial three-dimensional coordinates of all the characteristic points on the image according to the camera spatial coordinates to obtain a point cloud model, and further reconstructing the three-dimensional topography of the pavement through stereoscopic vision.
9. A road surface three-dimensional reconstruction for dynamic traffic scenes is characterized by comprising
The image acquisition module is used for acquiring images shot by the unmanned aerial vehicle;
the coordinate acquisition module is used for inputting the shot images serving as prediction set images into the improved YOLO vehicle detector to obtain coordinate files of the vehicles in the shot prediction set images; using the coordinate file of the vehicle as a reference, and univaluating the area in the vehicle boundary frame in the prediction set image;
the image sequence optimization module is used for obtaining the spatial relationship between adjacent images in the prediction set images through a feature extraction method, calculating the actual shielding rate and the ground overlapping rate between the adjacent images through the spatial relationship between the adjacent images and a vehicle boundary frame on each image, identifying invalid images through combining image sequence spatial optimization, removing the invalid images and obtaining preprocessed images;
and the reconstruction module is used for reconstructing the three-dimensional topography of the road surface through stereoscopic vision by combining the preprocessed image and the coordinate file of the image.
CN202210676065.2A 2022-06-15 2022-06-15 Road surface three-dimensional reconstruction method and system for dynamic traffic scene Pending CN115019208A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210676065.2A CN115019208A (en) 2022-06-15 2022-06-15 Road surface three-dimensional reconstruction method and system for dynamic traffic scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210676065.2A CN115019208A (en) 2022-06-15 2022-06-15 Road surface three-dimensional reconstruction method and system for dynamic traffic scene

Publications (1)

Publication Number Publication Date
CN115019208A true CN115019208A (en) 2022-09-06

Family

ID=83075463

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210676065.2A Pending CN115019208A (en) 2022-06-15 2022-06-15 Road surface three-dimensional reconstruction method and system for dynamic traffic scene

Country Status (1)

Country Link
CN (1) CN115019208A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117218244A (en) * 2023-11-07 2023-12-12 武汉博润通文化科技股份有限公司 Intelligent 3D animation model generation method based on image recognition
CN117437368A (en) * 2023-12-22 2024-01-23 深圳大学 Unmanned plane-based pavement evenness measuring method, system, terminal and medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117218244A (en) * 2023-11-07 2023-12-12 武汉博润通文化科技股份有限公司 Intelligent 3D animation model generation method based on image recognition
CN117218244B (en) * 2023-11-07 2024-02-13 武汉博润通文化科技股份有限公司 Intelligent 3D animation model generation method based on image recognition
CN117437368A (en) * 2023-12-22 2024-01-23 深圳大学 Unmanned plane-based pavement evenness measuring method, system, terminal and medium
CN117437368B (en) * 2023-12-22 2024-04-26 深圳大学 Unmanned plane-based pavement evenness measuring method, system, terminal and medium

Similar Documents

Publication Publication Date Title
CN110285793B (en) Intelligent vehicle track measuring method based on binocular stereo vision system
CN111144388B (en) Monocular image-based road sign line updating method
KR101105795B1 (en) Automatic processing of aerial images
US7509241B2 (en) Method and apparatus for automatically generating a site model
CN103971404B (en) 3D real-scene copying device having high cost performance
CN104574393B (en) A kind of three-dimensional pavement crack pattern picture generates system and method
Hoppe et al. Online Feedback for Structure-from-Motion Image Acquisition.
CN115019208A (en) Road surface three-dimensional reconstruction method and system for dynamic traffic scene
US11682170B2 (en) Generating three-dimensional geo-registered maps from image data
JPH0554128A (en) Formation of automatic video image database using photograph ic measurement
CN112509125A (en) Three-dimensional reconstruction method based on artificial markers and stereoscopic vision
CN112800524A (en) Pavement disease three-dimensional reconstruction method based on deep learning
CN116258817B (en) Automatic driving digital twin scene construction method and system based on multi-view three-dimensional reconstruction
CN108663026A (en) A kind of vibration measurement method
CN114648669A (en) Motor train unit fault detection method and system based on domain-adaptive binocular parallax calculation
Fei et al. Ossim: An object-based multiview stereo algorithm using ssim index matching cost
CN113345084B (en) Three-dimensional modeling system and three-dimensional modeling method
CN111325828A (en) Three-dimensional face acquisition method and device based on three-eye camera
Ebrahimikia et al. True orthophoto generation based on unmanned aerial vehicle images using reconstructed edge points
CN117456114A (en) Multi-view-based three-dimensional image reconstruction method and system
CN105352482A (en) Bionic compound eye microlens technology-based 3-3-2 dimension object detection method and system
CN117115359A (en) Multi-view power grid three-dimensional space data reconstruction method based on depth map fusion
CN116129318A (en) Unsupervised monocular three-dimensional target detection method based on video sequence and pre-training instance segmentation
CN114387532A (en) Boundary identification method and device, terminal, electronic equipment and unmanned equipment
CN113554102A (en) Aviation image DSM matching method for cost calculation dynamic programming

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination