CN117173214A

CN117173214A - High-precision map real-time global positioning tracking method based on road side monocular camera

Info

Publication number: CN117173214A
Application number: CN202311124950.0A
Authority: CN
Inventors: 李永; 赵治国; 田瑞
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2023-09-03
Filing date: 2023-09-03
Publication date: 2023-12-05

Abstract

The invention discloses a high-precision map real-time global positioning and tracking method based on a roadside monocular camera, which includes: self-calibration of the roadside camera, learning semantic clues from traffic scenes to realize camera self-calibration; target detection and tracking, and tracking of vehicles on the road. The image plane is detected and tracked to obtain the pixel position changes within a period of time; the angle between the target vehicle and the camera is estimated, and the angle between the vehicle and the camera is calculated based on the pixel position and the calibrated camera parameters; the distance between the target vehicle and the camera is estimated , combine the calibrated camera parameters to calculate the distance between the vehicle and the camera; for final positioning calculation, use the obtained distance and angle combined with the camera's longitude and latitude to achieve global positioning of the target; display the global positioning results obtained by tracking on a high-precision map in real time superior. Automatically realize monocular global positioning of uncalibrated cameras on any roadside without manual operation, and achieve positioning and tracking in images and high-precision planes.

Description

A real-time global positioning and tracking method for high-precision maps based on roadside monocular cameras

技术领域Technical field

本发明涉及计算机视觉领域，特别是涉及一种基于深度学习的相机标定方法。The present invention relates to the field of computer vision, and in particular to a camera calibration method based on deep learning.

背景技术Background technique

随着自动驾驶技术的飞速发展，自动驾驶安全越来越受到重视。作为自动驾驶感知中的第一步，精确定位是保证自动驾驶安全的前提。With the rapid development of autonomous driving technology, autonomous driving safety is receiving more and more attention. As the first step in autonomous driving perception, precise positioning is the prerequisite to ensure the safety of autonomous driving.

使用全球卫星导航系统(GNSS)或内置GNSS技术的各种商业产品对车辆进行全局定位和跟踪已得到广泛应用，然而，基于GNSS的定位方案需要至少四颗导航卫星的视线(LOS)，而在许多地区，特别是在卫星信号严重衰减甚至被拒绝的高度建筑密集地区，由于城市峡谷效应，无法保证如此多的导航卫星的可达性，此外，纯GNSS定位在城市中的均方根误差(RMSE)通常大于5m，即使采用惯性传感器、差分校正或多星座接收机进行增强，也无法在可用性和准确性方面保证所需的导航性能。另一方面，随着路侧监控相机的大范围部署，在这种情况下，我们需要寻找一种在缺少GNSS情况下，利用大规模交通监控相机实现路侧单目车辆全局定位的方法。虽然目前已经有大量利用路侧监控相机实现的智能路侧应用，包括速度估计，车流统计与预测，违章停车识别等任务来提高道路安全和效率，但目前还很少有系统的方法来专门解决路侧车辆的定位问题。Global positioning and tracking of vehicles using Global Satellite Navigation System (GNSS) or various commercial products with built-in GNSS technology has been widely used. However, GNSS-based positioning schemes require line of sight (LOS) of at least four navigation satellites, and in In many areas, especially in highly built-up areas where satellite signals are severely attenuated or even rejected, the accessibility of so many navigation satellites cannot be guaranteed due to the urban canyon effect. In addition, the root mean square error of pure GNSS positioning in cities ( RMSE) is usually greater than 5m, and even if enhanced with inertial sensors, differential correction or multi-constellation receivers, the required navigation performance cannot be guaranteed in terms of availability and accuracy. On the other hand, with the large-scale deployment of roadside surveillance cameras, in this case, we need to find a method to use large-scale traffic surveillance cameras to achieve global positioning of roadside monocular vehicles in the absence of GNSS. Although there are already a large number of intelligent roadside applications using roadside surveillance cameras, including speed estimation, traffic flow statistics and prediction, illegal parking identification and other tasks to improve road safety and efficiency, there are currently few systematic methods to specifically solve the problem. Roadside vehicle positioning problem.

目前大多数定位方法都是基于四点求解单应性矩阵的方式手动在地图和图像中选取至少四个对应点，实现从相机坐标系到世界坐标系的转换。这种方法虽然实施简单，但是容易引入人工误差。更关键的是，在目前智能交通相机大规模部署场景下，此方法都不得不对每个相机通过人工的方式，建立初始点对应关系求解坐标系转换矩阵，这严重限制了它的应用规模和部署效率。Most current positioning methods are based on solving the homography matrix at four points by manually selecting at least four corresponding points in maps and images to achieve conversion from the camera coordinate system to the world coordinate system. Although this method is simple to implement, it is easy to introduce manual errors. More importantly, in the current large-scale deployment scenario of smart traffic cameras, this method has to manually establish the initial point correspondence and solve the coordinate system transformation matrix for each camera, which seriously limits its application scale and deployment. efficiency.

目前的单目定位大多在实验场景下进行测试，相机标定默认已完成，或是依赖于手动方法提前进行相机标定，如张正友等人通过插入棋盘的方式完成标定。这些方法也都需要人工去现场，车流的限制导致了这种方法无法适用于实际交通场景。更重要的是对于一些路侧场景相机角度常常会受到物理因素发生角度变化，同时由于目前PTZ相机的普及，相机焦距也会随时改变，因此人工的相机标定方法不适用。Most of the current monocular positioning is tested in experimental scenarios. The camera calibration has been completed by default, or it relies on manual methods to perform camera calibration in advance. For example, Zhang Zhengyou and others completed the calibration by inserting a chessboard. These methods also require manual visits to the scene, and traffic flow restrictions make this method unsuitable for actual traffic scenarios. More importantly, for some roadside scenes, the camera angle is often affected by physical factors. At the same time, due to the current popularity of PTZ cameras, the camera focal length will also change at any time, so manual camera calibration methods are not applicable.

目前还有一种路侧相机标定自标定方法，它依赖于车辆消失点检测，但是这类方法要求道路近似呈直线，或车辆轨迹近似直线，在一些转弯的场景无法使用。There is currently a self-calibration method for roadside camera calibration, which relies on vehicle vanishing point detection. However, this method requires the road to be approximately straight, or the vehicle trajectory to be approximately straight, and cannot be used in some turning scenarios.

综上所述，目前还没有一种完整的方案来解决以上路侧单目定位的问题，能够在实现自动相机标定的同时，不需要场景先验信息，不需要任何人工操作，适用于任何未知场景的单目车辆定位。To sum up, there is currently no complete solution to solve the above roadside monocular positioning problem. It can achieve automatic camera calibration without requiring scene prior information or any manual operation. It is suitable for any unknown Monocular vehicle localization of scenes.

发明内容Contents of the invention

为解决上述实际问题，本发明提供一种基于路侧单目相机的高精度地图实时全局定位跟踪方法，技术方案如下：In order to solve the above practical problems, the present invention provides a high-precision map real-time global positioning and tracking method based on a roadside monocular camera. The technical solution is as follows:

本发明提供的技术方案，能够基于现有路侧监控相机，自动实现标定，无需人工参与，对场景没有任何要求。该算法可以应用于任何路口，直接对场景中车辆进行全局定位，同时将定位后得到的全局定位结果在高精度地图上进行跟踪。The technical solution provided by the present invention can automatically realize calibration based on existing roadside surveillance cameras without manual participation and without any requirements on the scene. This algorithm can be applied to any intersection, directly global positioning of vehicles in the scene, and at the same time, the global positioning results obtained after positioning are tracked on a high-precision map.

一种基于路侧单目相机的全局定位跟踪方法，该方法包括：A global positioning and tracking method based on roadside monocular cameras, which includes:

步骤A，相机进行自动标定，得到相机内外参数；Step A, the camera is automatically calibrated to obtain the internal and external parameters of the camera;

步骤B，目标检测器对图像中车辆进行检测，得到车辆在某一帧时图像中像素位置；Step B: The target detector detects the vehicle in the image and obtains the pixel position of the vehicle in the image when it is in a certain frame;

步骤C，所述目标跟踪器在检测基础上将前后帧之间建立关联，得到车辆随时间的像素位置变化；Step C: The target tracker establishes an association between the previous and next frames on the basis of detection to obtain the pixel position changes of the vehicle over time;

步骤D，车辆定位接受像素位置变化进行车辆与相机之间的角度计算和距离计算，得到车辆全局位置；Step D, vehicle positioning accepts pixel position changes to calculate the angle and distance between the vehicle and the camera to obtain the global position of the vehicle;

步骤E，高精度地图定位跟踪通过计算得到的全局定位信息在高精度地图显示。Step E, high-precision map positioning tracking calculates the global positioning information and displays it on the high-precision map.

所述步骤A中，在路侧场景下，采用基于深度学习的模型，直接从场景中学习语义线索，推理垂直视场角，滚动角，俯仰角，不依赖手动输入和场景先验信息。In step A, in the roadside scene, a deep learning-based model is used to directly learn semantic clues from the scene and infer the vertical field of view angle, roll angle, and pitch angle without relying on manual input and scene prior information.

所述步骤A中，不直接估计焦距，而是将焦距转换为垂直视场角进行间接推理；俯仰角，滚动角和垂直视场角连续值分别将其离散化为256个桶进行分类预测，接着使用三个全联接层分别预测这三个量，最后将每个全联接头的概率分布的期望值作为它们的预测值；对于垂直视场角，使用Softargmaxbiased-L2损失，对于滚动和俯仰使用标准的Softargmax-L2损失。In step A, the focal length is not directly estimated, but the focal length is converted into a vertical field of view angle for indirect reasoning; the continuous values of the pitch angle, roll angle and vertical field of view angle are discretized into 256 buckets for classification prediction. Then use three fully connected layers to predict these three quantities respectively, and finally use the expected value of the probability distribution of each fully connected head as their predicted value; for the vertical field of view, use the Softargmaxbiased-L2 loss, and use the standard for roll and pitch Softargmax-L2 loss.

所述步骤B和C中，只需要检测和跟踪得到的检测框底部中心点作为定位的输入，精确的定位取决于相机全局参数，如经纬度，相机高度，相机航向角，和相机自身参数，包括相机内外参数。In steps B and C, only the center point at the bottom of the detection frame needs to be detected and tracked as the input for positioning. The precise positioning depends on the global parameters of the camera, such as latitude and longitude, camera height, camera heading angle, and the camera's own parameters, including Camera internal and external parameters.

所述步骤D中，车辆与相机之间的角度估计依赖于相机航向角和水平视场角，车辆与相机之间的纵向距离依赖于相机高度，在获得相机参数基础上，同时进行车辆与相机之间的距离和角度的计算，之后，利用距离和角度信息结合相机自身航向角和经纬度计算得到相机经纬度。In step D, the angle estimation between the vehicle and the camera depends on the camera heading angle and horizontal field of view, and the longitudinal distance between the vehicle and the camera depends on the camera height. On the basis of obtaining the camera parameters, the vehicle and camera are simultaneously estimated. Calculate the distance and angle between them, and then use the distance and angle information combined with the camera's own heading angle and longitude and latitude to calculate the camera's longitude and latitude.

一种基于路侧单目相机的全局高精度地图定位跟踪系统，包括：相机自标定、车辆检测、车辆跟踪、车辆定位以及高精度地图定位跟踪；A global high-precision map positioning and tracking system based on roadside monocular cameras, including: camera self-calibration, vehicle detection, vehicle tracking, vehicle positioning and high-precision map positioning and tracking;

所述相机自标定通过深度学习网络预测相机焦距、滚动角、俯仰角信息；The camera self-calibration predicts camera focal length, roll angle, and pitch angle information through a deep learning network;

所述车辆检测和跟踪部分实时检测和跟踪车辆，得到车辆检测框底部中心点坐标随时间的变化；The vehicle detection and tracking part detects and tracks the vehicle in real time, and obtains the change of the coordinates of the center point at the bottom of the vehicle detection frame over time;

所述车辆与相机之间的角度估计与距离估计再得到车辆与相机之间的角度和距离之后结合地理位置转换公式得到目标车辆的经纬度坐标；The angle estimate and distance estimate between the vehicle and the camera are then used to obtain the angle and distance between the vehicle and the camera, and then combined with the geographical location conversion formula to obtain the longitude and latitude coordinates of the target vehicle;

所述高精度车辆定位和跟踪输入像素变化和相机标定得到的相机参数结合定位算法将地理定位结果实时显示在高精度地图，同时在图像平面和高精度地图平面上进行实时定位跟踪。The high-precision vehicle positioning and tracking input pixel changes and camera parameters obtained by camera calibration are combined with the positioning algorithm to display the geolocation results on a high-precision map in real time, while performing real-time positioning and tracking on the image plane and the high-precision map plane.

所述相机标定算法，只推理垂直视场角，俯仰角和滚动角，不需要任何手动输入，对场景没有要求。The camera calibration algorithm only infers the vertical field of view angle, pitch angle and roll angle, does not require any manual input, and has no requirements for the scene.

所述目标检测和跟踪算法，2D目标检测框底部中心点像素位置被用来计算车辆与相机之间的距离和角度。According to the target detection and tracking algorithm, the pixel position of the center point at the bottom of the 2D target detection frame is used to calculate the distance and angle between the vehicle and the camera.

角度估计是通过结合相机标定得到的水平视场角、相机自身航向角和目标跟踪得到的车辆像素位置实现，车辆与相机之间角度由下式计算结果：Angle estimation is achieved by combining the horizontal field of view angle obtained by camera calibration, the camera's own heading angle and the vehicle pixel position obtained by target tracking. The angle between the vehicle and the camera is calculated by the following formula:

ω_d＝ω_h+ω_c ω _d =ω _h +ω _c

式中，ω_h表示相机航向角；In the formula, ω _h represents the camera heading angle;

ω_c表示车辆与相机航向角之间的角度；ω _c represents the angle between the vehicle and the camera heading angle;

D表示车辆与相机之间的距离；D represents the distance between the vehicle and the camera;

ωd表示车辆与相机之间的距离D与正北方向的顺时针夹角。ωd represents the clockwise angle between the distance D between the vehicle and the camera and the north direction.

距离估计是通过结合相机标定得到的视场角、和目标跟踪得到的车辆像素位置实现，车辆与相机之间角度由下式计算结果：Distance estimation is achieved by combining the field of view angle obtained by camera calibration and the vehicle pixel position obtained by target tracking. The angle between the vehicle and the camera is calculated by the following formula:

式中，(x，y)为图像中车辆的检测框底部中心点像素坐标；In the formula, (x, y) is the pixel coordinate of the center point at the bottom of the detection frame of the vehicle in the image;

H为相机安装高度；H is the camera installation height;

Y表示车辆与相机之间的纵向距离；Y represents the longitudinal distance between the vehicle and the camera;

w，h分别是图像的像素宽度和像素高度；w, h are the pixel width and pixel height of the image respectively;

θ表示相机的俯仰角；θ represents the pitch angle of the camera;

f表示相机焦距；f represents the camera focal length;

最终车辆与相机之间的距离由纵向距离和车辆与相机之间的角度决定，全局定位结果通过结合相机与车辆之间的距离和角度，相机自身经纬度转换，最终全局定位经纬度结果由下式计算：The final distance between the vehicle and the camera is determined by the longitudinal distance and the angle between the vehicle and the camera. The global positioning result is converted by combining the distance and angle between the camera and the vehicle and the camera's own longitude and latitude. The final global positioning longitude and latitude result is calculated by the following formula :

R＝6371.393×1000R＝6371.393×1000

式中，R为地球半径，单位为米；In the formula, R is the radius of the earth in meters;

l，g分别表示车辆的纬度和经度；l and g respectively represent the latitude and longitude of the vehicle;

l_c、g_c分别表示相机的纬度和经度；l _c and g _c represent the latitude and longitude of the camera respectively;

高精度地图跟踪定位是通过图像计算得到的经纬度显示到地图上进行，没有人为参与。High-precision map tracking and positioning is performed by displaying the longitude and latitude calculated from the image on the map without human participation.

本发明方案的提出，能够极大减少路侧定位场景下的部署流程和人为操作，降低了由于人工操作导致的人为误差，提高了路侧定位算法的对场景的适应能力，对不同相机型号的适应能力。此外单目定位结果能够作为GNSS信号缺失位置的补充服务，为交通场景下所有自动驾驶车辆提供各自的实时定位结果。The proposed solution of the present invention can greatly reduce the deployment process and manual operations in roadside positioning scenarios, reduce human errors caused by manual operations, improve the adaptability of the roadside positioning algorithm to scenarios, and adapt to different camera models. adaptability. In addition, monocular positioning results can be used as a supplementary service for locations where GNSS signals are missing, providing real-time positioning results for all autonomous vehicles in traffic scenarios.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明中记载的一些实施例，对于本领域普通技术人员来讲，还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only These are some embodiments recorded in the present invention. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings.

图1为本发明实施例的单目定位的角度估计方法示意图；Figure 1 is a schematic diagram of an angle estimation method for monocular positioning according to an embodiment of the present invention;

图2为本发明实施例的角度估计中的车辆位于相机视场角右侧示意图Figure 2 is a schematic diagram of the vehicle located on the right side of the camera field of view during angle estimation according to the embodiment of the present invention.

图3为本发明实施例的角度估计中的车辆位于相机视场角左侧示意图Figure 3 is a schematic diagram of the vehicle located on the left side of the camera field of view during angle estimation according to the embodiment of the present invention.

图4为本发明实施例的单目定位的距离估计方法示意图。Figure 4 is a schematic diagram of a distance estimation method for monocular positioning according to an embodiment of the present invention.

具体实施方式Detailed ways

本发明的目的是，实现路侧基于单目的定位，无需对相机手动标定，且能够适用于不同的路侧场景，在标定基础上实现全局车辆定位，并在高精度地图中进行实时定位跟踪。The purpose of this invention is to achieve single-purpose positioning on the roadside without manual calibration of the camera, and to be applicable to different roadside scenarios, to achieve global vehicle positioning based on calibration, and to perform real-time positioning and tracking in high-precision maps.

要实现以上的解决方案，必须解决相机标定问题，以及在标定基础上不通过手动方式实现车辆全局定位。To realize the above solution, the problem of camera calibration must be solved, and the global positioning of the vehicle must be achieved without manual methods based on calibration.

针对上述问题，本发明提供一种基于路侧单目相机的高精度地图实时全局定位与跟踪算法，不依赖手动输入，不限制部署场景，不限制相机类型与安装位置与朝向。In response to the above problems, the present invention provides a high-precision map real-time global positioning and tracking algorithm based on roadside monocular cameras, which does not rely on manual input, does not limit deployment scenarios, and does not limit camera types, installation positions and orientations.

本发明中，对于几何相机模型，场景中3D点p_w和图像像素位置关系p_im可以表示为：In the present invention, for the geometric camera model, the relationship between the 3D point p _w in the scene and the image pixel position p _im can be expressed as:

p_im＝[λu λv λ]^T＝K[R|t][p_w|1]^T，p _im ＝[λu λv λ] ^T ＝K[R|t][p _w |1] ^T ,

其中，K是相机投影矩阵(相机内参)，R和t是相机在世界坐标系下的旋转和平移(相机外参)。本发明采用了主流假设，即采用针孔相机模型，图像中心为主点，相机没有倾斜，并且在水平和垂直方向的比例相等(长宽比为1，正方形像素)，焦距在两个方向上相等，则f_x＝f_y＝f，投影矩阵K可以表示为K＝diag([ff1])，其中f是焦距(以像素为单位)。Among them, K is the camera projection matrix (camera internal parameters), R and t are the rotation and translation of the camera in the world coordinate system (camera external parameters). This invention adopts the mainstream assumptions, that is, a pinhole camera model is used, the center of the image is the main point, the camera is not tilted, and the proportions in the horizontal and vertical directions are equal (aspect ratio is 1, square pixels), and the focal length is in both directions are equal, then f _x =f _y =f, and the projection matrix K can be expressed as K = diag ([ff1]), where f is the focal length (in pixels).

由于焦距的范围是无限的并且当我们调整图像大小时，它会发生变化，因此本发明中，直接估计垂直视场α的弧度，并通过以下方式将其转换为焦距，其中h为图像像素高度：Since the range of focal length is infinite and changes when we resize the image, in this invention, the arc of the vertical field of view α is directly estimated and converted to focal length by, where h is the image pixel height :

旋转矩阵可以用三个角度来描述，这三个角度分别是滚动角度、俯仰角度和偏航角度。由于不存在从任意图像中估计偏航(左和右)的自然参照系，所以我们不对偏航角进行估计，所以我们的旋转矩阵由俯仰角θ、翻滚角ψ两个角度组成，通过水平线来表示这两个量，其中的相机的俯仰角θ体现为水平中点与图像横坐标中心点的交点位置，翻滚角ψ体现为估计线相对水平线转动角度。The rotation matrix can be described by three angles, which are roll angle, pitch angle and yaw angle. Since there is no natural frame of reference for estimating yaw (left and right) from arbitrary images, we do not estimate the yaw angle, so our rotation matrix consists of two angles, the pitch angle θ and the roll angle ψ, expressed through the horizontal line Representing these two quantities, the camera's pitch angle θ is represented by the intersection position of the horizontal midpoint and the image abscissa center point, and the roll angle ψ is represented by the rotation angle of the estimated line relative to the horizontal line.

将俯仰角θ、翻滚角ψ和垂直视场角α的角度在空间域中离散化为256个桶，将回归问题转化为B分类问题。本发明使用ResNet-50作为主干网络，并使用单独的全连接层来预测俯仰角θ、翻滚角ψ和垂直视场角α。每个桶的中心值可以表示为θ＝[θ₁，...θ_i，...θ_B]，ψ＝[ψ₁，...ψ_i，...ψ_B]，α＝[α₁，...α_i，...α_B]，令分别表示每个头部的全连接层的概率质量。各个全连接头的概率质量的期望值可表示为：The angles of the pitch angle θ, roll angle ψ and vertical field of view α are discretized into 256 buckets in the spatial domain, and the regression problem is transformed into a B classification problem. The present invention uses ResNet-50 as the backbone network and uses a separate fully connected layer to predict the pitch angle θ, roll angle ψ and vertical field of view α. The center value of each bucket can be expressed as θ=[θ ₁ ,...θ _i ,...θ _B ], ψ = [ψ ₁ ,...ψ _i ,...ψ _B ], α = [ α ₁ ,...α _i ,...α _B ], let represents the probability mass of the fully connected layer of each head separately. The expected value of the probability mass of each fully connected joint can be expressed as:

接着，对于垂直视场角α，使用Softargmaxbiased-L2损失，对于俯仰角θ、翻滚角ψ使用标准的Softargmax-L2，最终损失函数表示为：Next, for the vertical field of view α, the Softargmaxbiased-L2 loss is used, and for the pitch angle θ and roll angle ψ, the standard Softargmax-L2 is used. The final loss function is expressed as:

模型可以在目前一些公开的相机标定数据集上进行训练，如pano360数据集，训练后即可得到相机视场角，俯仰角，滚动角。The model can be trained on some currently public camera calibration data sets, such as the pano360 data set. After training, the camera field of view, pitch angle, and roll angle can be obtained.

在标定的同时，我们也对场景中的车辆进行目标检测和跟踪，得到每辆车对应的像素位置，目标检测和跟踪算法可以替换为任意算法。在校准之后，我们得到了相机的俯仰角θ、翻滚角ψ和垂直视场角α。在交通场景中，相对于俯仰角θ，翻滚角ψ的偏差对最终定位的影响较小。因此，本发明不考虑翻滚角的影响。While calibrating, we also perform target detection and tracking on the vehicles in the scene to obtain the pixel position corresponding to each vehicle. The target detection and tracking algorithm can be replaced with any algorithm. After calibration, we obtain the camera’s pitch angle θ, roll angle ψ, and vertical field of view α. In traffic scenes, relative to the pitch angle θ, the deviation of the roll angle ψ has less impact on the final positioning. Therefore, the present invention does not consider the influence of roll angle.

如图1所示，给定相机的位置，我们只需要知道相机与目标车辆之间的距离D和角度ω_d，就可以估计目标车辆的地理位置。在本发明中，ω_d表示距离D与相机相对于地图正北N之间的顺时针夹角，ω_c表示车辆与相机朝向角ω_h之间的角度。给定图像宽度w和相机水平视场角hfov，车辆检测框底部中心的坐标为(x，y)，ω_c可以表示为：As shown in Figure 1, given the position of the camera, we only need to know the distance D and angle ω _d between the camera and the target vehicle to estimate the geographical location of the target vehicle. In the present invention, ω _d represents the clockwise angle between the distance D and the camera relative to the true north N of the map, and ω _c represents the angle between the vehicle and the camera orientation angle ω _h . Given the image width w and the camera horizontal field of view hfov, the coordinates of the bottom center of the vehicle detection frame are (x, y), and ω _c can be expressed as:

ω_h表示相机航向角；ω _h represents the camera heading angle;

ω_d表示车辆与相机之间的距离D与正北方向的顺时针夹角。ω _d represents the clockwise angle between the distance D between the vehicle and the camera and the north direction.

图1的两种不同场景如图2和图3所示，其中车辆的行驶方向分别位于相机视场的右侧和左侧。对于这两种情况下，ω_d均可表示为：The two different scenes in Figure 1 are shown in Figures 2 and 3, in which the driving direction of the vehicle is located on the right and left sides of the camera's field of view respectively. For both cases, ω _d can be expressed as:

ω_d＝ω_h+ω_c ω _d =ω _h +ω _c

距离估计模型如图4所示，假设在道路场景中检测到一辆车，其位置为于地面坐标系，(X_w，Y_w，Z_w)，设θ_v为检测到的车辆后部或前部与道路平面交点的(相对于相机的)投影射线，H为相机的安装高度。本发明假设水平道路，则根据角度关系易知，纵向距离Y有如下表示：The distance estimation model is shown in Figure 4. Assume that a vehicle is detected in the road scene, and its position is in the ground coordinate system, (X _w , Y _w , Z _w ). Let θ _v be the rear of the detected vehicle or The projection ray (relative to the camera) of the intersection between the front and the road plane, H is the installation height of the camera. The present invention assumes a horizontal road. According to the angle relationship, it is easy to know that the longitudinal distance Y is expressed as follows:

其中，β又可以表示为：Among them, β can be expressed as:

则相机与车辆之间的纵向距离Y便可以通过以下方式计算：Then the longitudinal distance Y between the camera and the vehicle can be calculated in the following way:

(x，y)为图像中车辆的检测框底部中心点像素坐标；(x, y) are the pixel coordinates of the center point at the bottom of the detection frame of the vehicle in the image;

H为相机安装高度；H is the camera installation height;

θ表示相机的俯仰角；θ represents the pitch angle of the camera;

f表示相机焦距；f represents the camera focal length;

最终距离D为：The final distance D is:

在获得距离D和角度ω_d之后，假设相机自身纬度和经度分别为：l_c，g_c车辆纬度l和经度g可以通过以下公式获得：After obtaining the distance D and angle ω _d , assuming that the camera's own latitude and longitude are: l _c , g _c The vehicle latitude l and longitude g can be obtained by the following formula:

R＝6371.393×1000，R=6371.393×1000,

其中R为地球半径，单位为米。l，g分别表示车辆的纬度和经度；where R is the radius of the earth in meters. l and g respectively represent the latitude and longitude of the vehicle;

在得到车辆的全局位置后，可将定位结果实时展示在高精度地图上，最终实现路侧单目高精度地图实时定位和跟踪。通过以上方法便可以在不依赖人工条件下实现定位，不需要人工去现场进行标定，即插即用。且后续可以通过优化相机标定精度，提高定位精度。After obtaining the global position of the vehicle, the positioning results can be displayed on a high-precision map in real time, ultimately achieving real-time positioning and tracking on the roadside monocular high-precision map. Through the above method, positioning can be achieved without relying on manual conditions. There is no need to manually go to the site for calibration, and it is plug and play. And in the future, the positioning accuracy can be improved by optimizing the camera calibration accuracy.

以上所述仅是本发明的具体实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above are only specific embodiments of the present invention. It should be pointed out that those of ordinary skill in the art can also make several improvements and modifications without departing from the principles of the present invention. These improvements and modifications can also be made. should be regarded as the protection scope of the present invention.

Claims

1. A global positioning tracking method based on a road side monocular camera comprises the following steps:

step A, a camera performs automatic calibration to obtain internal and external parameters of the camera;

step B, detecting the vehicle in the image by a target detector to obtain the pixel position of the vehicle in the image when the vehicle is in a certain frame;

step C, the target tracker establishes association between the front frame and the rear frame on the basis of detection to obtain the pixel position change of the vehicle along with time;

step D, the vehicle positioning receives the pixel position change to calculate the angle and distance between the vehicle and the camera, and the global position of the vehicle is obtained;

and E, displaying the global positioning information obtained through calculation in the high-precision map positioning tracking.

2. The global positioning tracking method based on roadside monocular camera of claim 1, wherein,

in the step A, under a road side scene, a model based on deep learning is adopted to directly learn semantic clues from the scene, and the vertical view angle, the rolling angle and the pitch angle are inferred without depending on manual input and prior information of the scene.

3. The global positioning tracking method based on roadside monocular camera of claim 1, wherein,

in the step A, the focal length is not directly estimated, but is converted into a vertical field angle for indirect reasoning; the pitch angle, the roll angle and the vertical view angle continuous values are respectively discretized into 256 barrels for classification prediction, then three full-coupling layers are respectively used for predicting the three amounts, and finally expected values of probability distribution of each full-coupling head are used as predicted values of the full-coupling heads; for vertical field angle, softargmaxbiased-L2 loss was used, and for roll and pitch standard Softargmax-L2 loss was used.

4. The global positioning tracking method based on roadside monocular camera of claim 1, wherein,

in the steps B and C, only the center point at the bottom of the detection frame obtained by detection and tracking is needed to be used as an input of positioning, and the accurate positioning depends on global parameters of the camera, such as longitude and latitude, camera height, camera course angle and parameters of the camera, including parameters inside and outside the camera.

5. The global positioning tracking method based on roadside monocular camera of claim 1, wherein,

in the step D, the angle estimation between the vehicle and the camera depends on the course angle and the horizontal view angle of the camera, the longitudinal distance between the vehicle and the camera depends on the height of the camera, the distance and the angle between the vehicle and the camera are calculated simultaneously on the basis of obtaining the parameters of the camera, and then the longitude and the latitude of the camera are calculated by combining the course angle and the longitude and the latitude of the camera by using the distance and the angle information.

6. A global high-precision map positioning and tracking system based on a road side monocular camera is characterized by comprising the following components: camera self-calibration, vehicle detection, vehicle tracking, vehicle positioning and high-precision map positioning tracking;

the camera self-calibration predicts the focal length, rolling angle and pitch angle information of the camera through a deep learning network;

the vehicle detection and tracking part detects and tracks the vehicle in real time to obtain the change of the center point coordinate at the bottom of the vehicle detection frame along with time;

the longitude and latitude coordinates of the target vehicle are obtained by combining a geographic position conversion formula after the angle and distance between the vehicle and the camera are obtained through the angle estimation and the distance estimation;

the high-precision vehicle positioning and tracking input pixel change and camera parameter obtained by camera calibration are combined with a positioning algorithm to display the geographic positioning result on a high-precision map in real time, and meanwhile, real-time positioning and tracking are carried out on an image plane and a high-precision map plane.

7. The roadside monocular camera-based global high-precision map positioning and tracking system of claim 6, wherein,

the camera calibration algorithm only infers the vertical field angle, the pitch angle and the roll angle, does not need any manual input, and has no requirement on scenes.

8. The roadside monocular camera-based global high-precision map positioning and tracking system of claim 6, wherein,

the target detection and tracking algorithm, the 2D target detection frame bottom center pixel location is used to calculate the distance and angle between the vehicle and the camera.

9. The roadside monocular camera-based global high-precision map positioning and tracking system of claim 6, wherein,

the angle estimation is realized by combining a horizontal field angle obtained by camera calibration, a self course angle of a camera and a vehicle pixel position obtained by target tracking, and the angle between the vehicle and the camera is calculated by the following formula:

ω _d ＝ω _h +ω _c

wherein omega is _h Representing camera headingA corner;

ω _c representing an angle between a vehicle and a camera heading angle;

d represents the distance between the vehicle and the camera;

ωd represents the clockwise angle of the distance D between the vehicle and the camera from the north direction.

10. The roadside monocular camera-based global high-precision map positioning and tracking system of claim 6, wherein,

the distance estimation is realized by combining a field angle obtained by camera calibration and a vehicle pixel position obtained by target tracking, and the angle between the vehicle and the camera is calculated by the following formula:

wherein, (x, y) is the pixel coordinate of the bottom center point of the detection frame of the vehicle in the image;

h is the camera mounting height;

y represents the longitudinal distance between the vehicle and the camera;

w, h are the pixel width and pixel height of the image, respectively;

θ represents a pitch angle of the camera;

f represents a camera focal length;

the final distance between the vehicle and the camera is determined by the longitudinal distance and the angle between the vehicle and the camera, the global positioning result is calculated by combining the distance and the angle between the camera and the vehicle, the longitude and latitude of the camera are converted, and the final global positioning longitude and latitude result is calculated by the following formula:

R＝6371.393×1000

wherein R is the radius of the earth, and the unit is meter;

l, g represent latitude and longitude of the vehicle, respectively;

l _c 、g _c representing the latitude and longitude of the camera, respectively;

the high-precision map tracking and positioning is performed by displaying longitude and latitude obtained through image calculation on a map without human participation.