CN115019208A

CN115019208A - Road surface three-dimensional reconstruction method and system for dynamic traffic scene

Info

Publication number: CN115019208A
Application number: CN202210676065.2A
Authority: CN
Inventors: 杨旭; 管进超; 李毅; 洪翰; 丁玲
Original assignee: Changan University
Current assignee: Changan University
Priority date: 2022-06-15
Filing date: 2022-06-15
Publication date: 2022-09-06

Abstract

The invention discloses a road surface three-dimensional reconstruction method and a road surface three-dimensional reconstruction system facing to a dynamic traffic scene, which are used for realizing the road surface three-dimensional reconstruction under the influence of dynamic traffic based on unmanned aerial vehicle stereo photography and deep learning, removing vehicle noise on aerial images by adopting a lightweight deep learning framework, and automatically adjusting the image overlapping rate in different traffic density areas through image sequence space optimization so as to improve the quality and speed of road surface three-dimensional modeling.

Description

Road surface three-dimensional reconstruction method and system for dynamic traffic scene

Technical Field

The invention relates to the field of pavement health monitoring, in particular to a pavement three-dimensional reconstruction method and a pavement three-dimensional reconstruction system for a dynamic traffic scene.

Background

The road surface condition has great influence on the safety, comfort and economy of traffic transportation, and the road surface health monitoring is the basis for ensuring the normal operation of roads. Most of the current automatic detection of pavement diseases is based on two-dimensional images, more accurate disease identification can be realized based on a three-dimensional model of the pavement, and meanwhile, three-dimensional size information of various diseases can be obtained. With the development of three-dimensional measurement technology, the collection of road surface three-dimensional data has become a hot spot of current research.

At present, the three-dimensional measurement technology of the pavement can be mainly divided into a laser imaging method and a stereoscopic vision method. At present, most of three-dimensional measuring devices matched with multifunctional road surface detection vehicles are based on a laser imaging technology, and the imaging mode mainly adopts a ToF (time of flight) principle or a structured light principle. Although laser three-dimensional imaging can generate a pavement model quickly, the pavement model is susceptible to strong light and vibration, and the equipment cost is high.

Therefore, stereoscopic three-dimensional imaging based on color images becomes an alternative method for low-cost and high-precision pavement three-dimensional reconstruction. Stereoscopic vision three-dimensional reconstruction can be mainly divided into binocular stereoscopic imaging and monocular stereoscopic imaging according to the number of cameras. Binocular stereo imaging is the restoration of scene depth by parallax produced between two cameras that are fixed in position. The binocular stereo imaging has the advantages of high imaging speed, but has the disadvantages of poor modeling resolution, and the inaccuracy of three-dimensional reconstruction is caused by the deviation of the installation position of the camera.

Monocular stereoscopic imaging may generate a three-dimensional model from images taken by a single camera during movement. In the field of three-dimensional reconstruction of traffic infrastructure, monocular stereo imaging mainly adopts unmanned aerial vehicle photography to realize scene three-dimensional reconstruction. And generating a road surface three-dimensional point cloud model through image feature matching and epipolar geometry based on the multi-view road surface image. Although the monocular stereo imaging of the unmanned aerial vehicle can realize the three-dimensional reconstruction of a large-scale road scene, the monocular stereo imaging of the unmanned aerial vehicle still has the following two difficulties in application: (1) the unmanned aerial vehicle stereoscopic photography can be implemented only on a closed road, and vehicles running on an open road can block the road surface and generate a large amount of noise in the modeling process; (2) in order to observe a road surface area blocked by a vehicle, only the image acquisition density can be greatly increased, resulting in a sudden increase in the modeling processing time.

In order to realize automatic recognition of vehicles on aerial images, current research has attempted to perform feature recognition, such as buildings like roads, bridges, etc., using image processing methods or deep learning. However, the calculation amount of the existing target identification method is large, and the size change of objects on the unmanned aerial vehicle images with different heights can have great influence on the identification precision.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a road surface three-dimensional reconstruction method and a road surface three-dimensional reconstruction system for a dynamic traffic scene, so as to solve the problems of reconstruction model loss and a large amount of surface noise caused by the shielding influence of a moving vehicle in a road operation environment and the problem of low road surface three-dimensional reconstruction efficiency in a large visual field in the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:

a road surface three-dimensional reconstruction method for a dynamic traffic scene comprises the following steps:

step 1, collecting an image shot by an unmanned aerial vehicle;

step 2, inputting the shot images serving as prediction set images into an improved YOLO vehicle detector to obtain coordinate files of the vehicles in the shot prediction set images; using the coordinate file of the vehicle as a reference, and univaluating the area in the vehicle boundary frame in the prediction set image;

step 3, obtaining the spatial relationship between adjacent images in the images of the prediction set by a feature extraction method, calculating the actual shielding rate and the ground overlapping rate between the adjacent images by the spatial relationship between the adjacent images and the vehicle boundary frame on each image, identifying invalid images by combining image sequence space optimization, removing the invalid images and obtaining preprocessed images;

and reconstructing the three-dimensional topography of the road surface through stereoscopic vision by combining the preprocessed image and the coordinate file of the image.

The invention is further improved in that:

preferably, before step 1, the boundary coordinates of the planned shooting road surface area are extracted from the electronic map, a flight area file is generated, the flight area file is imported into the unmanned aerial vehicle, and the unmanned aerial vehicle shoots the planned shooting road surface area.

Preferably, the process of obtaining the coordinate file of the vehicle in the image of the shot prediction set by improving the YOLO vehicle detector in step 2 is as follows:

(1) inputting the shot image into an improved YOLO vehicle detector in an initial (512, 3) size, preliminarily predicting a vehicle boundary frame, and counting the average pixel width of the vehicle;

(2) and calculating a scaling factor k according to the ratio of the average pixel width of the vehicles on the images of the training set to the average pixel width of the vehicles on the images of the prediction set.

(3) And (5) adjusting the input size of the prediction set image to (512 xk, 3) through a scaling factor, inputting the input size to the deep learning framework again, and predicting the final accurate vehicle boundary box.

Preferably, the improved YOLO vehicle detector is obtained by the following process:

respectively sampling, splicing and feature extracting images extracted from input images by a layer 6, a layer 12 and a layer 14 in a backbone feature extraction network, and then predicting the type, the position and the confidence coefficient of a boundary frame on the basis of 9 predefined anchor point frames through standard convolution to obtain a vehicle boundary frame angle point coordinate file data set of a predicted image;

comparing the marked image vehicle boundary frame angle point coordinate file data set with the predicted image vehicle boundary frame angle point coordinate file data set to obtain an error, training the YOLO vehicle detector until the error is smaller than a set value, and obtaining an improved YOLO vehicle detector;

the backbone feature extraction network comprises 1 CBL module and 13 DBR modules; the CBL module comprises a standard convolution layer, 1 batch normalization layer and a Leaky ReLU activation layer; the DBR module includes a depth separable convolution layer, a bulk normalization layer, and a ReLU active layer.

Preferably, in step 3, the spatial relationship between the adjacent images is calculated by the following formula:

wherein the content of the first and second substances,

the method comprises the following steps of (1) taking characteristic points on an image, wherein R is a rotation matrix, and T is a translation matrix; and calculating actual values of the rotation matrix R and the translation matrix T through a plurality of characteristic point pairs to obtain a spatial relationship between adjacent images.

Preferably, in step 3, the process of identifying the invalid image is:

(1) let the overlap ratio of the original image sequence be IOR _max Then the lowest image overlap rate after optimization is IOR _min ＝1-2*(1-IOR _max )；

(2) Searching at 2-step intervals from the 1 st image, and calculating the occlusion rate OCR obtained by combining the ith image and the (i + 2) th image ₁ I.e. the proportion of the overlapping area of the two image vehicles to the overlapping area of the images. Meanwhile, the actual ground overlap ratio GOR of the area outside the shielding position of the vehicle can be obtained ₁ ；

(3) Calculating the occlusion rate OCR obtained by combining the ith image, the (i + 1) th image and the (i + 2) th image ₂ That is, the ratio of the vehicle overlapping area of the three images to the image overlapping area, and the actual ground overlapping ratio GOR of the area other than the vehicle shielding position can be obtained ₂ 。

(4) If OCR ₂ >OCR ₁ And GOR ₂ >GOR ₁ And if not, deleting the (i + 1) th image.

Preferably, the calculation formula of the occlusion rate OCR is as follows:

the occlusion rate OCR is the area of the part which is still occluded by the vehicle/the area of the original image after the two images are overlapped;

the calculation formula of the ground overlap ratio GOR is as follows:

the ground overlap ratio GOR is the area of the vehicle shielding part/the original image area after the two images are overlapped.

Preferably, in step 4, the coordinates of the preprocessed image are matched with the coordinates of the image in step 1, and the relative spatial position of the camera between different pictures is preliminarily estimated; extracting characteristic points on each image and matching the characteristic points of adjacent images; reversely calculating the space coordinate of the camera according to the characteristic point pairs and the space epipolar geometry; and resolving the spatial three-dimensional coordinates of all the characteristic points on the image according to the camera spatial coordinates to obtain a point cloud model, and further reconstructing the three-dimensional topography of the pavement through stereoscopic vision.

A dynamic traffic scene oriented three-dimensional reconstruction of a roadway comprising:

the image acquisition module is used for acquiring images shot by the unmanned aerial vehicle;

the coordinate acquisition module is used for inputting the shot images serving as prediction set images into the improved YOLO vehicle detector to obtain coordinate files of the vehicles in the shot prediction set images; using the coordinate file of the vehicle as a reference, and univaluating the area in the vehicle boundary frame in the prediction set image;

the image sequence optimization module is used for obtaining the spatial relationship between adjacent images in the prediction set images through a feature extraction method, calculating the actual shielding rate and the ground overlapping rate between the adjacent images through the spatial relationship between the adjacent images and a vehicle boundary frame on each image, identifying invalid images through combining image sequence spatial optimization, removing the invalid images and obtaining preprocessed images;

and the reconstruction module is used for reconstructing the three-dimensional topography of the road surface through stereoscopic vision by combining the preprocessed image and the coordinate file of the image.

Compared with the prior art, the invention has the following beneficial effects:

the invention discloses a road surface three-dimensional reconstruction method and a road surface three-dimensional reconstruction system facing to a dynamic traffic scene, which are used for realizing the road surface three-dimensional reconstruction under the influence of dynamic traffic based on unmanned aerial vehicle stereo photography and deep learning, removing vehicle noise on aerial images by adopting a lightweight deep learning framework, and automatically adjusting the image overlapping rate in different traffic density areas through image sequence space optimization so as to improve the quality and speed of road surface three-dimensional modeling. The invention has the advantages that:

(1) the image three-dimensional reconstruction frame can be used for three-dimensional reconstruction of a road surface under an open road environment and is not influenced by running vehicles; (2) the vehicle noise on the aerial image can be quickly identified and eliminated, and the vehicle size change on the unmanned aerial vehicle images with different heights cannot influence the identification precision; (3) the number of required images under different traffic flow densities can be identified, the spatial distribution of aerial image sequences is optimized, and invalid images are reduced; (4) the integrity and the spatial precision of the reconstructed road surface point cloud model can be ensured, and the three-dimensional modeling speed of a large-scale scene is improved.

Drawings

FIG. 1 is an overall frame diagram of the present invention;

FIG. 2 is a frame diagram of unmanned aerial vehicle image vehicle recognition deep learning;

FIG. 3 is a diagram of unmanned aerial vehicle image vehicle identification and noise cancellation effects;

FIG. 4 is an unmanned aerial vehicle image space relationship estimation;

FIG. 5 is a schematic diagram illustrating the calculation of the vehicle shielding rate and the ground overlapping rate;

FIG. 6 is a frame diagram for unmanned aerial vehicle image sequence spatial optimization;

FIG. 7 is a spatial distribution diagram before and after optimization of an image sequence of an unmanned aerial vehicle;

wherein, (a) is a road section 1 original image sequence; (b) a sequence of spatially optimized images for road segment 1; (c) is a road section 2 original image sequence; (d) a sequence of spatially optimized images for road segment 2;

FIG. 8 is a three-dimensional road surface reconstruction effect diagram under the influence of traffic according to the present invention;

wherein (a) is a reconstructed three-dimensional model of road segment 1; (b) a reconstructed three-dimensional model for road segment 2;

FIG. 9 is a comparison of modeled velocities for the present invention.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings:

in the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplification of description, but do not indicate or imply that the device or element referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention; the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance; furthermore, unless expressly stated or limited otherwise, the terms "mounted," "connected," and "coupled" are to be construed broadly and encompass, for example, both fixed and removable coupling arrangements; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Referring to fig. 1, the invention provides a road surface rapid three-dimensional reconstruction method based on unmanned aerial vehicle stereo photography and deep learning, which specifically comprises the following steps:

step 1, automatically acquiring road images on an open road through a multi-rotor unmanned aerial vehicle, and recording space coordinates when the images are shot.

Step 1.1, extracting boundary coordinates of a planned shooting road surface area in an electronic map, and generating a flight area file in a kml format, wherein the boundary coordinates of the preselected aerial survey road surface area are in a closed loop form, namely { (x) ₁ ,y ₁ ),(x ₂ ,y ₂ ),…,(x _n ,y _n ),(x ₁ ,y ₁ )}。

And step 1.2, importing the generated flight area file into the unmanned aerial vehicle, and setting shooting parameters, namely a camera inclination angle, a camera resolution, a shooting height, a longitudinal overlapping rate and a transverse overlapping rate. Specifically, the photographing parameters should be set according to different traffic environments and modeling accuracy requirements. On a road with open traffic, the flying height of the unmanned aerial vehicle is not less than 10 meters, the inclination angle of the camera is between 60 and 90 degrees, the longitudinal overlapping rate of adjacent shot images is not less than 90 percent, and the transverse overlapping rate of the images is not less than 60 percent.

And step 1.3, executing the aerial photography task, acquiring aerial photography images of different road positions, recording position information during image shooting, and generating an xml coordinate file matched with the image name.

Through step 1, a kml format flight area file of the planned shot road surface area and an xml coordinate file of the shot image are obtained.

And 2, constructing an improved YOLO (you Only Look one) deep learning frame to detect the position of the vehicle, and removing the area containing the vehicle.

The part is divided into a training stage in the previous stage and a using stage in the actual application process.

(I) training phase

And 2.1, selecting part of aerial images as training set images for marking, marking the positions of the vehicles on the images through the boundary frames, and generating a boundary frame corner point coordinate file, wherein the specific vehicle types do not need to be distinguished in the image marking process, and the vehicle boundary frames only need to be classified into one type and are marked images.

Step 2.2, with reference to fig. 2, an improved YOLO deep learning framework, which is a YOLO vehicle detector, is constructed, and the improved YOLO deep learning framework is a depth separable convolution (depth separable convolution) and image resolution active adjustment unit is fused, and is used for positioning various vehicle positions on the aerial image by using a bounding box. The embedded depth separable convolution of the improved YOLO vehicle detector comprises two parts of channel-by-channel convolution and point-by-point convolution. The purpose of improving the image resolution active adjustment unit embedded by the YOLO vehicle detector is to enhance the generalization performance of the model. The standard convolution in the original YOLO deep learning framework is replaced by the deep separable convolution so as to reduce the calculated amount and improve the characteristic extraction performance.

And meanwhile, an image resolution active adjusting unit is adopted to adjust the resolution of the network input image according to the size of the vehicle on the images with different aerial photographing heights. The overall framework is as follows: the input to the improved YOLO network is a three-channel color image, and the input size of the image in the training phase is (512, 3), since the original image needs to be converted into an input image of (512 × k,3) size, where k is a scaling factor and 512 × k needs to be a multiple of 32. The pixel size of the vehicle on the image is preliminarily predicted through an image resolution active adjusting unit, and an appropriate scaling coefficient k is selected according to the pixel size to generate an input image with the optimal resolution.

And 2.3, inputting the marked image data set with the size of (512, 3) into the model established in the step 2.2, and training the constructed YOLO vehicle detector.

The process of predicting the image boundary box corner point coordinate file by the YOLO vehicle detector is to firstly extract the characteristics of the input image adjusted with resolution through a backbone network, wherein the backbone network comprises a basic component CBL module and a DBR module. The CBL module includes 1 standard convolutional layer (C), 1 bulk normalization layer (B) and 1 leakage ReLU active layer (L). The DBR module consists of 1 depth separable convolutional layer (D), 1 bulk normalization layer (B) and 1 ReLU active layer (R). The backbone feature extraction network consists of 1 CBL module and 13 DBR modules, and the tail ends of the CBL modules and the DBR modules are subjected to feature fusion through a space pyramid pool. The image features extracted from the 6 th layer, the 12 th layer and the 14 th layer (tail end) in the backbone feature extraction network are respectively subjected to up-sampling, splicing and feature extraction. Finally, predicting the type, position and confidence of the boundary frame on the basis of 9 predefined anchor point frames through standard convolution to obtain a boundary frame corner point coordinate file of each image and form a vehicle boundary frame corner point coordinate file data set of each image.

And estimating the error between the predicted image and the annotated image, and defining model loss by using the CIoU until the training loss is stable and the error of the predicted value is less than a preset value. The network training process can adopt the training technologies of transfer learning, training freezing, dynamic learning rate, early stopping and the like.

(II) stage of use

Step 2.4, the trained improved YOLO vehicle detector is used for identifying and positioning various vehicles on different aerial images, so that the positions of other bounding boxes are predicted with high precision, an image resolution active adjustment unit is applied, and other data sets are predicted by using the detector, and the method mainly comprises the following steps:

(1) inputting the prediction set image into an improved YOLO vehicle detector in an initial (512, 3) size, preliminarily predicting a vehicle boundary box, and counting the average pixel width of the vehicle;

(3) And (5) adjusting the input size of the prediction set image to be (512 x k,3), inputting the improved YOLO vehicle detector again, and predicting the final accurate vehicle boundary box, namely the coordinate file of the prediction set image.

And resetting the RGB values of the pixels in the predicted boundary frame to (255,255 and 255) so as to eliminate the vehicle noise on the image.

FIG. 3 illustrates vehicle identification results of the improved YOLO deep learning framework on various aerial image datasets. After the position of the bounding box is predicted with high precision, the pixels of the image inside the bounding box are reset to (255 ), i.e. the region containing the vehicle is rendered single-valued.

And step 3, re-matching the image positions, calculating the ground overlapping rate and the shielding rate among different images, dynamically optimizing the spatial distribution of the images and reducing the number of aerial photos.

And 3.1, extracting the feature points on all the images through a feature extraction algorithm, and matching the feature points on the adjacent images according to the similarity of the feature points to obtain the actual overlapping area and the relative position relation of the adjacent images. The image Feature extraction algorithm may adopt sift (scale artifact Feature transform) algorithm, surf (speedup Robust Feature) algorithm, fast (features From accessed Segment test) algorithm, and the like. The position relation of adjacent images can be solved through a plurality of known point pairs to obtain a rotation matrix and a translation matrix.

Referring to fig. 4, in order to estimate the actual spatial relationship between the aerial images, feature points on all the images are extracted by a feature extraction algorithm

And matching features on adjacent images according to similarityFeature points, generating pairs of feature points

The actual overlapping area and relative positional relationship of adjacent images can be described by a rotation matrix R and a translation matrix T:

and solving the actual values of the rotation matrix R and the translation matrix T through a plurality of characteristic point pairs to obtain the spatial relationship between the images.

And 3.2, referring to fig. 5, calculating the actual occlusion rate OCR and the ground overlap ratio GOR between the images according to the spatial positions of the adjacent images and the vehicle occlusion positions on the images. The vehicle occlusion rate is a ratio of a vehicle overlapping area to an image overlapping area on a plurality of images, and the ground overlap rate is an overlapping ratio of an area other than a vehicle occlusion position. The overlapping rate of the original image sequence is not less than 90%, and the overlapping rate of the optimized images is between 80% and 90%. The ground overlapping rate should be more than 70% to ensure the alignment precision when the road surface is reconstructed in three dimensions.

In particular, the method comprises the following steps of,

the occlusion rate OCR is the area of the portion of the two images that is still occluded by the vehicle/the area of the original image.

And 3.3, identifying invalid photos through image sequence space optimization according to the actual shielding rate and the ground overlapping rate between the images, and reducing the number of images used for three-dimensional reconstruction.

Referring to fig. 6, based on the actual occlusion rate OCR between the images and the ground overlap ratio GOR, the spatial distribution of the image sequence is optimized by,

(2) Searching at 2-step intervals from the 1 st image, and calculating the occlusion rate OCR obtained by combining the ith image and the (i + 2) th image ₁ I.e. the proportion of the overlapping area of the two image vehicles to the overlapping area of the images. Meanwhile, the actual ground overlap ratio GOR of the area outside the shielding position of the vehicle can be obtained ₁ 。

Referring to fig. 7, the image distribution of the two road segments before and after aerial photography image sequence space optimization is shown, and after optimization, the number of images participating in modeling can be greatly reduced, the image density in a small traffic flow area is effectively reduced, and redundant images are deleted.

And 4, reconstructing the road surface three-dimensional morphology through stereoscopic vision based on the preprocessed aerial images.

Step 4.1, the preprocessed image generated in the step 3 is matched with the image coordinates extracted in the step 1 again, and the relative space position of the camera is preliminarily estimated; extracting the characteristic points on each picture and matching the characteristic points of the adjacent images; and inversely calculating the space coordinates of the camera according to the characteristic point pairs and the space epipolar geometry.

And 4.2, resolving the space three-dimensional coordinates of all the characteristic points on the image according to the obtained camera space coordinates to obtain a point cloud model.

As one preferable scheme, the aerial image preprocessing operation in

steps

2 and 3 may be performed in synchronization with the image three-dimensional reconstruction operation in step 4, and the processing may be performed in batches.

Based on the preprocessed image sequence, the exact position of the camera is back-calculated by stereo vision, and the spatial coordinates of the feature points on the image are estimated, and the generated point cloud model is shown in fig. 8. Meanwhile, the pavement point cloud model generated by the method is subjected to space precision verification.

The invention also discloses a road surface three-dimensional reconstruction facing the dynamic traffic scene, which comprises the following steps:

the image sequence optimization module is used for obtaining the spatial relationship between adjacent images in the images of the prediction set through a feature extraction method, calculating the actual shielding rate and the ground overlapping rate between the adjacent images through the spatial relationship between the adjacent images and a vehicle boundary frame on each image, identifying invalid images through combining image sequence spatial optimization, removing the invalid images and obtaining preprocessed images;

Table 1 below shows the comparison of the digital model measurements with the manual measurements. The average relative errors of the three-dimensionally reconstructed 7-meter, 12-meter and 20-meter models of the embodiment are respectively 0.23%, 0.23% and 0.29%, and it is found that the road surface point cloud models generated under different camera heights and traffic volumes have higher-level spatial accuracy.

TABLE 1 digital model measurement accuracy

Referring to fig. 9, the processing time of the modeling method of the present invention and the conventional modeling method on the three-dimensional reconstruction of the road surface is compared. Compared with the traditional modeling method, the modeling time average of the modeling method can be reduced by 38.28%. According to the traffic flow conditions of different road sections, the optimization efficiency of the method is between 30 and 50 percent.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A road surface three-dimensional reconstruction method for a dynamic traffic scene is characterized by comprising the following steps:

step 1, collecting an image shot by an unmanned aerial vehicle;

2. The method for three-dimensional reconstruction of the road surface facing the dynamic traffic scene according to claim 1, characterized in that, before step 1, the boundary coordinates of the planned shot road surface area are extracted from the electronic map, a flight area file is generated, the flight area file is imported into the unmanned aerial vehicle, and the unmanned aerial vehicle shoots the planned shot road surface area.

3. The method for three-dimensional reconstruction of a road surface facing to a dynamic traffic scene as claimed in claim 1, wherein the step 2 of obtaining the coordinate file of the vehicle in the image of the shooting prediction set by improving the YOLO vehicle detector comprises the following steps:

(2) calculating a scaling factor k according to the ratio of the average pixel width of the vehicles on the images of the training set to the average pixel width of the vehicles on the images of the prediction set;

(3) and adjusting the input size of the prediction set image to be (512 x k,3) through the scaling factor, inputting the input size to the deep learning framework again, and predicting the final accurate vehicle boundary box.

4. The method for reconstructing the three-dimensional road surface facing the dynamic traffic scene as claimed in claim 1, wherein the obtaining process of the improved YOLO vehicle detector is as follows:

5. The method for three-dimensional reconstruction of road surface facing to dynamic traffic scene as claimed in claim 1, wherein in step 3, the spatial relationship between the adjacent images is calculated by the following formula:

wherein the content of the first and second substances,

6. The method for three-dimensional reconstruction of a road surface facing a dynamic traffic scene as claimed in claim 1, wherein in step 3, the process of identifying the invalid image is as follows:

(2) Searching at 2-step intervals from the 1 st image, and calculating the occlusion rate OCR obtained by combining the ith image and the (i + 2) th image ₁ The proportion of the overlapping area of the two image vehicles in the overlapping area of the images is; meanwhile, the actual ground overlap ratio GOR of the area outside the shielding position of the vehicle can be obtained ₁ ；

(3) Calculating the occlusion rate OCR obtained by combining the ith image, the (i + 1) th image and the (i + 2) th image ₂ That is, the ratio of the vehicle overlapping area of the three images to the image overlapping area, and the actual ground overlapping ratio GOR of the area other than the vehicle shielding position can be obtained ₂ ；

7. The method for three-dimensional reconstruction of a road surface facing a dynamic traffic scene as claimed in claim 6, wherein the calculation formula of the occlusion rate OCR is as follows:

the calculation formula of the ground overlap ratio GOR is as follows:

8. The method for three-dimensional reconstruction of a road surface facing to a dynamic traffic scene as claimed in claim 1, wherein in step 4, coordinates of the preprocessed image and the image in step 1 are matched, and relative spatial positions of the cameras between different pictures are preliminarily estimated; extracting characteristic points on each image and matching the characteristic points of adjacent images; reversely calculating the space coordinate of the camera according to the characteristic point pairs and the space epipolar geometry; and resolving the spatial three-dimensional coordinates of all the characteristic points on the image according to the camera spatial coordinates to obtain a point cloud model, and further reconstructing the three-dimensional topography of the pavement through stereoscopic vision.

9. A road surface three-dimensional reconstruction for dynamic traffic scenes is characterized by comprising