CN104751119A

CN104751119A - Rapid detecting and tracking method for pedestrians based on information fusion

Info

Publication number: CN104751119A
Application number: CN201510071310.7A
Authority: CN
Inventors: 焦建彬; 高山; 韩振军; 叶齐祥; 庞丽金
Original assignee: University of Chinese Academy of Sciences
Current assignee: University of Chinese Academy of Sciences
Priority date: 2015-02-11
Filing date: 2015-02-11
Publication date: 2015-07-01

Abstract

The invention discloses a fast detection and tracking method for pedestrians based on information fusion, which is characterized in that the method comprises the following steps: Step S1: use a laser scanner to scan the surrounding environment at a fixed frequency to obtain laser data; Step S2: filter out Invalid data in the laser data is used to obtain candidate target data; step S3: calibrate the coordinate parameters between the laser scanner and the camera to obtain coordinate conversion parameters between the two coordinate systems; step S4: based on the candidate Target data, confirming the candidate targets therein; Step S5: Establish a real-time tracking model, and track the candidate targets confirmed in the step S4 according to the real-time tracking model. The invention perfectly integrates the advantages of the laser sensor and the visual sensor, and effectively improves the accuracy and timeliness of the detection and tracking of pedestrians by a single sensor.

Description

Fast pedestrian detection and tracking method based on information fusion

技术领域 technical field

本发明属于多传感器，多源信息融合技术领域，涉及数字图像处理，模式识别，数据关联等多方面内容，尤其是一种基于信息融合的行人快速检测跟踪方法，用于智能车辆技术中的重要组成部分——车辆辅助安全驾驶系统，对驾驶环境中最不可控的因素——行人进行感知，检测，跟踪，分析，预警，从而保障车辆行驶安全。 The invention belongs to the field of multi-sensor and multi-source information fusion technology, and relates to digital image processing, pattern recognition, data association and other aspects, especially a fast detection and tracking method for pedestrians based on information fusion, which is used in the important technology of intelligent vehicles. The component—vehicle assisted safety driving system—perceives, detects, tracks, analyzes and warns the most uncontrollable factor in the driving environment—pedestrians, so as to ensure the safety of vehicles.

背景技术 Background technique

智能车辆技术按功能主要分为自主导航和安全保障两个方面。自主导航技术的应用依赖于整个智能交通系统(ITS)的建立和完善，短期内难以达到实用化，而安全保障技术却可以独立应用于辅助驾驶系统，通过对周围的行驶环境做出检测跟踪，从而判断可能对驾驶员造成的威胁，因此对于解决因驾驶员主观因素产生的交通事故可以提供技术支持。 Intelligent vehicle technology is mainly divided into two aspects: autonomous navigation and safety assurance. The application of autonomous navigation technology depends on the establishment and improvement of the entire intelligent transportation system (ITS), and it is difficult to achieve practical application in the short term. However, safety assurance technology can be independently applied to the assisted driving system. By detecting and tracking the surrounding driving environment, In order to judge the possible threat to the driver, it can provide technical support for solving traffic accidents caused by the driver's subjective factors.

智能车辆安全保障技术分为安全监测与预警和主动安全保障，安全监测与预警主要指借助传感器和报警系统来监测车辆驾驶者状况、车辆隐患、特殊环境等，从而帮助驾驶员实现安全驾驶，而其中针对车辆前方的人体目标检测与跟踪致力于通过传感器对周边环境进行非接触探测来提高行驶安全。 Intelligent vehicle safety assurance technology is divided into safety monitoring and early warning and active safety assurance. Safety monitoring and early warning mainly refers to monitoring the status of vehicle drivers, vehicle hidden dangers, special environments, etc. by means of sensors and alarm systems, so as to help drivers achieve safe driving. Among them, the detection and tracking of human targets in front of the vehicle is dedicated to non-contact detection of the surrounding environment through sensors to improve driving safety.

用激光和视觉图像融合进行前方障碍物检测跟踪，不仅可以克服单独应用视觉图像传感器进行检测时易受天气状况和光照条件变化影响且无法得到检测跟踪对象的深度信息的缺点，也可以克服激光测距无法判断障碍物类别，无法可视处理，冗余报警的缺点。 Using laser and visual image fusion to detect and track obstacles in front can not only overcome the shortcomings of being easily affected by changes in weather conditions and lighting conditions and not being able to obtain depth information for detection and tracking objects when visual image sensors are used alone for detection, but also can overcome the shortcomings of laser measurement. The distance cannot judge the obstacle type, cannot be processed visually, and has the disadvantages of redundant alarms.

发明内容 Contents of the invention

本发明通过融合从激光扫描仪得到的深度数据和从CCD摄像机得到的图片信息，来实现对行人目标的准确和快速检测跟踪，给予分析和预警，从而达到辅助驾驶员安全驾驶的目的。本发明同时利用两种不同种类的传感器，利用各自的优点完成对路面行人的快速检测跟踪，实现优势互补，并且能够最大程度地在夜间、光照条件差、雾霾天气等复杂条件下为驾驶员提供正确快速的驾驶参考，实现辅助安全驾驶。 The present invention realizes the accurate and fast detection and tracking of pedestrian targets by fusing the depth data obtained from the laser scanner and the picture information obtained from the CCD camera, and provides analysis and early warning, so as to achieve the purpose of assisting the driver to drive safely. The present invention utilizes two different types of sensors at the same time, utilizes their respective advantages to complete rapid detection and tracking of pedestrians on the road surface, realizes complementary advantages, and can provide drivers with maximum protection under complex conditions such as nighttime, poor lighting conditions, and foggy weather. Provide correct and fast driving reference to realize assisted safe driving.

为实现本发明的目的，本发明提供的基于信息融合的行人快速检测跟踪方法，充分利用激光、图像的优点实时进行交通驾驶环境下行人的快速检测和跟踪，其包括以下步骤： In order to realize the purpose of the present invention, the method for rapid detection and tracking of pedestrians based on information fusion provided by the present invention fully utilizes the advantages of laser and image to perform rapid detection and tracking of pedestrians in the traffic driving environment in real time, which includes the following steps:

步骤S1：利用激光扫描仪以固定的频率扫描周围环境，得到激光数据； Step S1: Use a laser scanner to scan the surrounding environment at a fixed frequency to obtain laser data;

步骤S2：滤除所述激光数据中的无效数据，得到候选目标数据； Step S2: filtering out invalid data in the laser data to obtain candidate target data;

步骤S3：对于激光扫描仪和摄像机之间的坐标参数进行标定，得到两个坐标系之间的坐标转换参数； Step S3: Calibrate the coordinate parameters between the laser scanner and the camera to obtain coordinate conversion parameters between the two coordinate systems;

步骤S4：基于所述候选目标数据，对于其中的候选目标进行确认； Step S4: Based on the candidate target data, confirm the candidate targets therein;

步骤S5：建立实时跟踪模型，并根据所述实时跟踪模型对于所述步骤S4确认的候选目标进行跟踪。 Step S5: Establish a real-time tracking model, and track the candidate targets confirmed in step S4 according to the real-time tracking model.

本发明的有益效果是：本发明充分利用激光传感器和视觉传感器的联合在交通场景下行人检测跟踪方面的优势，其替代了传统视觉中所涉及的具体理论和方法，同时，针对复杂交通路面、复杂遮挡问题有了可靠的解决方法。与传统单一传感器检测跟踪方法相比，本发明在实时性和准确性等方面均有质的飞跃。 The beneficial effects of the present invention are: the present invention makes full use of the advantages of the combination of laser sensors and visual sensors in the detection and tracking of pedestrians in traffic scenes, which replaces the specific theories and methods involved in traditional vision, and at the same time, for complex traffic roads, There is a reliable solution to the complex occlusion problem. Compared with the traditional single-sensor detection and tracking method, the present invention has a qualitative leap in terms of real-time performance and accuracy.

附图说明 Description of drawings

图1是本发明基于信息融合的行人快速检测跟踪方法流程图； Fig. 1 is the flow chart of the pedestrian rapid detection and tracking method based on information fusion in the present invention;

图2a为未滤除无效数据的激光数据集合，图2b为滤除无效数据后得到的候选目标数据集合； Figure 2a is the laser data set without filtering out invalid data, and Figure 2b is the candidate target data set obtained after filtering out invalid data;

图3是根据本发明一实施例的激光扫描仪与摄像机矫正参数安装示意图； Fig. 3 is a schematic diagram of installation of laser scanner and camera correction parameters according to an embodiment of the present invention;

图4是根据本发明一实施例的候选区域生成结果示意图； Fig. 4 is a schematic diagram of candidate region generation results according to an embodiment of the present invention;

图5a是根据本发明一实施例的HOG特征示意图，图5b是根据本发明一实施例的人体检测示意图； Fig. 5a is a schematic diagram of HOG features according to an embodiment of the present invention, and Fig. 5b is a schematic diagram of human body detection according to an embodiment of the present invention;

图6是根据本发明一实施例的划分观测空间示意图； Fig. 6 is a schematic diagram of dividing an observation space according to an embodiment of the present invention;

图7是利用本发明方法的检测跟踪结果示意图。 Fig. 7 is a schematic diagram of detection and tracking results using the method of the present invention.

具体实施方式 Detailed ways

为使本发明的目的、技术方案和优点更加清楚明白，以下结合具体实施例，并参照附图，对本发明进一步详细说明。 In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in further detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

图1是本发明基于信息融合的行人快速检测跟踪方法的流程图，如图1所示，所述方法包括以下步骤： Fig. 1 is the flow chart of the pedestrian quick detection and tracking method based on information fusion of the present invention, as shown in Fig. 1, described method comprises the following steps:

在本发明一实施例中，选用的激光扫描仪为二维激光扫描仪SICKLMS291，其扫描范围是100°，角度分辨率为0.25°，距离范围在80米之内。 In one embodiment of the present invention, the selected laser scanner is a two-dimensional laser scanner SICKLMS291, with a scanning range of 100°, an angular resolution of 0.25°, and a distance within 80 meters.

所述二维激光扫描仪所返回的数据是其二维扫描平面内一组长度有限的离散数据点，这些数据能够反映周围物体的几何位置和形状，其中，每个数据还能够表示对应方向上与最近目标之间的距离。具体的离散数据点数与激光扫描仪的角度分辨率有关，在本发明一实施例中，每次返回400个离散数据，这些离散数据由极坐标形式给出，即：l_z＝(d_z,φ_z)^T,z＝1...400，在笛卡尔坐标系中可表示为： The data returned by the two-dimensional laser scanner is a set of discrete data points with a limited length in its two-dimensional scanning plane, and these data can reflect the geometric position and shape of surrounding objects, wherein each data can also represent The distance to the closest target. The number of specific discrete data points is related to the angular resolution of the laser scanner. In an embodiment of the present invention, 400 discrete data are returned each time, and these discrete data are given by polar coordinates, that is: _lz =( _dz , φ _z ) ^T , z=1...400, can be expressed in the Cartesian coordinate system as:

u_z＝(x_z,y_z)^T,z＝1...400， u _z =(x _z ,y _z ) ^T ,z=1...400,

其中，x_z＝d_zcosφ_z,y_z＝d_zsinφ。 Among them, x _z =d _z cosφ _z , y _z =d _z sinφ.

由于激光扫描仪对不同材质、不同颜色的物体存在测量误差，因此在返回的激光数据中可能会出现无效数据，而在无效数据中，很大一部分表现为零值和超出最大测距范围的值，另外还有因阳光辐射、车辆颠簸及反射物体材质造成的噪声点，可通过滤波将这部分无效数据去掉。 Due to the measurement errors of laser scanners for objects of different materials and colors, invalid data may appear in the returned laser data, and a large part of the invalid data appears as zero values and values beyond the maximum range , In addition, there are noise points caused by sunlight radiation, vehicle bumps and reflective object materials, which can be removed by filtering.

所述步骤S2进一步包括以下步骤： Said step S2 further comprises the following steps:

令集合L＝{l_z}表示一组激光数据，其中，z＝1,...,Z，Z表示激光数据点数。 Let the set L={l _z } represent a set of laser data, where z=1,...,Z, Z represents the number of laser data points.

首先，将集合L与核模板[-1,1]进行卷积，保留点间距在一定范围内的数据点，得到粗去噪结果集合：C＝{c_n},n＝1,....N，其中，N表示粗去噪结果集合中激光数据点数； First, the set L is convolved with the kernel template [-1,1], and the data points within a certain range are retained to obtain a rough denoising result set: C={c _n },n=1,... .N, where N represents the number of laser data points in the rough denoising result set;

然后，对于粗去噪结果集合进行聚类操作，得到一组候选目标数据集合S。 Then, for the coarse denoising result set Perform a clustering operation to obtain a set of candidate target data sets S.

其中，所述聚类操作具体为： Wherein, the clustering operation is specifically:

聚类初始，将每一数据点作为一类，计算新类与单个样本数据点之间的距离，若相邻两类c_n-1和c_n之间的间距属于某一预定阈值范围内，则认为它们属于同一类，否则就认为属于不同类，并以当前的单个样本数据点作为新增类的起始点。 In the initial clustering, each data point is regarded as a class, and the distance between the new class and a single sample data point is calculated. If the distance between two adjacent classes c _n-1 and c _n falls within a certain predetermined threshold range, They are considered to belong to the same class, otherwise they are considered to belong to different classes, and the current single sample data point is used as the starting point of the new class.

在聚类的过程中，大量的无效激光数据点被排除，最终得到了表示类似障碍物的一组候选目标数据集合S＝{s_m}，其中，m＝1,....M,表示候选目标的数量。这些候选目标在横向长度上与所检测目标类似，其特征包括了深度、长度及位置信息。 In the process of clustering, a large number of invalid laser data points are excluded, and finally a set of candidate target data sets S={s _m } representing similar obstacles are obtained, where m=1,....M, represents The number of candidate targets. These candidate targets are similar to the detected targets in horizontal length, and their characteristics include depth, length and position information.

图2a为未滤除无效数据的激光数据集合，图2b为滤除了无效数据后得到的候选目标数据集合。 Figure 2a is the laser data set without filtering out invalid data, and Figure 2b is the candidate target data set after filtering out invalid data.

步骤S3：对于激光扫描仪和摄像机之间的坐标参数进行标定，得到两个坐标系之间的坐标转换参数(φ,Δ)，其中φ为旋转矩阵，Δ为平移向量； Step S3: Calibrate the coordinate parameters between the laser scanner and the camera to obtain the coordinate transformation parameters (φ, Δ) between the two coordinate systems, where φ is the rotation matrix, and Δ is the translation vector;

该步骤在激光扫描仪与图像标定技术中移动平面模板的标定方法的基础上，加入了激光扫描仪本身所在的激光坐标系(二维扫描面)，其核心在于利用标定板被激光扫描仪和摄像机同时捕获的方法来求得激光坐标系和摄像机坐标系之间的旋转和平移矩阵。 This step is based on the calibration method of the mobile plane template in the laser scanner and image calibration technology, and the laser coordinate system (two-dimensional scanning surface) where the laser scanner itself is located is added. Simultaneous capture method to obtain the rotation and translation matrix between the laser coordinate system and the camera coordinate system.

图3示出了参数标定时激光扫描仪和摄像机的安装位置，图中，X_fY_fZ_f表示激光扫描仪的三维坐标平面，X_cY_cZ_c表示摄像机的三维坐标。从摄像机坐标系p到激光坐标系pf的变换关系如(1)式所示，其中，φ为摄像机坐标系变换到激光坐标系的旋转矩阵，表示激光扫描仪和摄像机之间的相对方向；Δ为从摄像机坐标系变换到激光坐标系的平移向量，表示两者之间的相对位置： Figure 3 shows the installation positions of the laser scanner and the camera during parameter calibration. In the figure, X _f Y _f Z _f represents the three-dimensional coordinate plane of the laser scanner, and X _c Y _c Z _c represents the three-dimensional coordinates of the camera. The transformation relationship from the camera coordinate system p to the laser coordinate system pf is shown in formula (1), where φ is the rotation matrix from the camera coordinate system to the laser coordinate system, which represents the relative direction between the laser scanner and the camera; Δ is the translation vector transformed from the camera coordinate system to the laser coordinate system, representing the relative position between the two:

p^f＝φp+Δ， (1) p ^f =φp+Δ, (1)

在摄像机坐标系下，将标定板参数化为一个三维向量N，此向量的方向平行于标定板的法向，||N||等于摄像机到标定板的距离。在摄像机坐标系下取标定板上的任一点p，由于p位于被参数化为N的标定板上，则有： In the camera coordinate system, the calibration plate is parameterized as a three-dimensional vector N, the direction of this vector is parallel to the normal direction of the calibration plate, and ||N|| is equal to the distance from the camera to the calibration plate. Take any point p on the calibration board in the camera coordinate system, since p is located on the calibration board parameterized as N, then:

N.p＝||N||²， (2) Np＝||N|| ² , (2)

其中，N可以通过摄像机的已知外参数[R,t](R是摄像机外参数的正交旋转矩阵，t是摄像机外参数的平移矩阵)求出： Among them, N can be obtained by the known external parameters [R, t] of the camera (R is the orthogonal rotation matrix of the external parameters of the camera, and t is the translation matrix of the external parameters of the camera):

N＝-r₃(r₃ ^T.t) (3)其中，r₃表示摄像机外参数正交旋转矩阵R的第三列向量。 N=-r ₃ (r ₃ ^T .t) (3) where, r ₃ represents the third column vector of the camera extrinsic parameter orthogonal rotation matrix R.

通过变换标定板的位置，可以得到一组不同的向量N以及相应的激光点在p^f下的位置坐标，即得到一组约束条件： By changing the position of the calibration plate, a set of different vectors N and the position coordinates of the corresponding laser points under p ^f can be obtained, that is, a set of constraints can be obtained:

N.φ^-1(p^f-Δ)＝||N||² (4) N.φ ^-1 (p ^f -Δ)＝||N|| ² (4)

求解该方程即可求得摄像机坐标系与激光坐标系之间的旋转和平移关系√和由此完成了激光扫描仪与摄像机的联合标定。投影结果如图4所示，其中，图4a为摄像机坐标系下物体的成像，其中数字1到6表示六个目标物；图4b为激光坐标系下物体的成像，图4c为物体在三维坐标下的位置坐标。 Solving this equation can obtain the rotation and translation relationship between the camera coordinate system and the laser coordinate system √ and This completes the joint calibration of the laser scanner and the camera. The projection results are shown in Figure 4, where Figure 4a is the imaging of the object in the camera coordinate system, where numbers 1 to 6 represent six targets; Figure 4b is the imaging of the object in the laser coordinate system, and Figure 4c is the object in the three-dimensional coordinate system The location coordinates below.

在获得候选目标数据后，需要利用视觉检测的方法对候选目标数据所在的区域进行检测，以对候选目标进行确认性的检测。在视觉检测部分需要针对不同目标训练不同的分类器(如人体、车辆等)。 After the candidate target data is obtained, it is necessary to use the method of visual detection to detect the area where the candidate target data is located, so as to confirm the detection of the candidate target. In the visual detection part, different classifiers need to be trained for different targets (such as human body, vehicle, etc.).

所述步骤S4进一步包括以下步骤： Said step S4 further comprises the following steps:

步骤S41，将训练样本图像(比如可以为64×128像素的人体训练样本)按照预定大小，比如8×8，的像素块分割为若干个基本单元(cell)； Step S41, the training sample image (such as a human body training sample of 64×128 pixels) is divided into several basic units (cells) according to a predetermined size, such as 8×8, of pixel blocks;

步骤S42，将每相邻的m个单元(比如以田字结构相邻的4个单元)划分为一个区域块block，将(-90°,90°)的梯度方向平均划分，规定每20度为一个基本方向bin，即可在180°的方向范围内得到9个基本方向bins； Step S42, divide each adjacent m units (for example, 4 adjacent units in a cross-shaped structure) into an area block, and divide the gradient direction of (-90°, 90°) equally, and stipulate that every 20 degrees As a basic direction bin, 9 basic direction bins can be obtained within the direction range of 180°;

步骤S43，对于每个基本单元cell，将其中所有像素在所有基本方向bins上进行投影以建立各自的梯度方向直方图； Step S43, for each basic unit cell, project all pixels therein on all basic direction bins to establish respective gradient direction histograms;

步骤S44，将每个block中含有的4个基本单元cell的梯度方向直方图连接起来，得到36维的向量； Step S44, connect the gradient orientation histograms of the 4 basic unit cells contained in each block to obtain a 36-dimensional vector;

步骤S45，再将所有的区域块block的36维向量归一化后串连起来，得到每个训练样本的HOG特征向量； Step S45, normalizing and concatenating the 36-dimensional vectors of all the block blocks to obtain the HOG feature vector of each training sample;

图5a为提取得到的HOG特征示意图，其反映了两个不同姿态下的人体轮廓。 Figure 5a is a schematic diagram of the extracted HOG features, which reflect the contours of the human body in two different poses.

步骤S46，提取得到训练样本图像的HOG特征向量之后，基于提取得到的HOG特征向量训练得到一个SVM分类模型用于目标检测； Step S46, after extracting the HOG feature vector of the training sample image, training based on the extracted HOG feature vector to obtain an SVM classification model for target detection;

步骤S47，对于摄像机坐标系所述训练样本图像上的某个候选区域，通过逐像素区域多尺度图像扫描的方式来判断该区域内是否存在目标。 Step S47 , for a certain candidate area on the training sample image in the camera coordinate system, it is judged whether there is a target in the area by means of pixel-by-pixel area multi-scale image scanning.

逐像素区域多尺度图像方法：如图5b所示，在一个192*96像素大小的区域内，通过128*64像素大小的基本模板进行扫描，每次扫描通过逐行逐列的移动基本模板，并通过SVM分类器来判断该区域下是否存在目标。 Pixel-by-pixel area multi-scale image method: as shown in Figure 5b, in an area with a size of 192*96 pixels, scan through a basic template with a size of 128*64 pixels, and each scan moves the basic template row by row, And use the SVM classifier to judge whether there is a target under this area.

本发明提出的模型从最大后验概率问题入手。假设，场景中各个目标的观测向量总合为：T＝(T₁,T₂,...,T_n)，而其中每个目标的观测向量表示为：T_i＝{p_i,o_i,v_i,f_i}，其中，p_i表示目标的位置坐标；o_i表示目标的运动方向；v_i表示目标的运动速度；f_i表示目标的表象特征(如颜色直方图，方向梯度等)。对于同一个目标的跟踪轨迹估计可以使用一连串的轨迹片段集合来表示，例如S＝{s_k}，其中，s_k表示轨迹片段。 The model proposed by the present invention starts with the problem of maximum a posteriori probability. Assume that the total observation vector of each target in the scene is: T=(T ₁ ,T ₂ ,...,T _n ), and the observation vector of each target is expressed as: T _i ={p _i ,o _i ,v _i ,f _i }, where p _i represents the position coordinates of the target; o _i represents the moving direction of the target; _{v i} _represents the moving speed of the target; ). The tracking trajectory estimation for the same target can be represented by a series of trajectory segment sets, for example, S={s _k }, where s _k represents the trajectory segment.

对于给定的观测向量T，数据关联的目标就是最大化后验概率P(S|T)，即： For a given observation vector T, the goal of data association is to maximize the posterior probability P(S|T), namely:

${S S}^{* *} = = \underset{S S}{arg arg max max} P P ((S S | | T T)) = = \underset{S S}{arg arg max max} P P ((T T / / S S)) P P ((S S)) . . - - - - - - ((55))$

其中，T为给定的观测向量，S为候选目标轨迹片段集合。 Among them, T is a given observation vector, and S is a set of candidate target trajectory segments.

因为估计S的空间非常庞大，很难直接优化式(5)。考虑到一个目标只会属于唯一的轨迹片段，因此，可以利用这个特性来有效的降低搜索空间的大小，即这个特性等价的约束条件可以表示为一个目标不能同时有多个轨迹标号： Because the estimated space of S is very large, it is difficult to directly optimize Equation (5). Considering that a target will only belong to a unique track segment, this feature can be used to effectively reduce the size of the search space, that is, the equivalent constraint of this feature can be expressed as a target cannot have multiple track labels at the same time:

其中，T_k表示场景中第k个目标的观测向量,T表示场景中各个目标的观测向量总合。 Among them, T _k represents the observation vector of the kth target in the scene, and T represents the sum of the observation vectors of each target in the scene.

根据被观测目标的关系，可将候选目标轨迹片段集合S划分为两部分，比如如图6a所示，将观测目标划分为独立目标(用虚线方框标出)和联合目标(用点划线方框标出)，图6b所示为观测目标在激光扫描仪下的位置坐标。 According to the relationship of the observed target, the candidate target trajectory segment set S can be divided into two parts. For example, as shown in Fig. box marked), Figure 6b shows the position coordinates of the observation target under the laser scanner.

则式(5)可以进一步写成： Then formula (5) can be further written as:

${S S}^{* *} = = \underset{S S}{arg arg} max max P P (({S S}^{α α + + β β})) P P ((T T | | {S S}^{α α + + β β})),, - - - - - - ((77))$

${S S}^{* *} \approx \approx \underset{S S}{arg arg max max} ΠP ΠP (({S S}^{α α})) ΠP ΠP ((T T | | {S S}^{α α})) + + ΠP ΠP (({S S}^{β β})) ΠP ΠP ((T T | | {S S}^{β β})) . . - - - - - - ((88))$

其中，将轨迹片段集合S^α+β划分为具有联合目标轨迹的区域S^α和独立目标轨迹区域S^β。 Among them, the track segment set S ^α+β is divided into a region S ^α with joint target trajectories and an area S ^β with independent target trajectories.

接着做进一步假设：目标与目标之间的运动是相互独立的，则式(8)可以重写为： Then make a further assumption: the movement between the target and the target is independent of each other, then formula (8) can be rewritten as:

$\begin{matrix} S S = = \underset{{S S}^{α α}}{arg arg min min} \underset{i i,, j j}{Σ Σ} {C C}_{j j,, i i} {t t}_{j j,, i i} + + \underset{i i}{Σ Σ} {C C}_{j j} {t t}_{i i} + + \underset{{S S}^{β β}}{arg arg min min} \underset{i i,, j j}{Σ Σ} {C C}_{j j,, i i} {t t}_{j j,, i i} + + \underset{i i}{Σ Σ} {C C}_{i i} {t t}_{i i} \\ = = \underset{{S S}^{α α}}{arg arg min min} \underset{i i,, j j}{Σ Σ} {C C}_{j j,, i i} {t t}_{j j,, i i} + + \underset{i i}{Σ Σ} {C C}_{i i} {t t}_{i i} + + \underset{{S S}^{β β}}{arg arg min min} \underset{i i,, j j}{Σ Σ} {C C}_{j j,, i i} {t t}_{j j,, i i} \end{matrix} - - - - - - ((99))$

其中的参数可以表示为： The parameters can be expressed as:

${t t}_{i i} = = \underset{j j}{Σ Σ} {t t}_{j j,, i i} \leq \leq 11,, - - - - - - ((1010))$

${C C}_{j j,, i i} = = - - log log {P P}_{sim sim} (({T T}_{{j j}_{k k - - 11}} | | {T T}_{{i i}_{k k}})) - - - - - - ((1111))$

${C C}_{i i} = = - - log log P P (({T T}_{i i} | | S S)) - - log log \frac{{p p}_{i i}}{11 - - {p p}_{i i}} = = log log \frac{11 - - {p p}_{i i}}{{p p}_{i i}} . . - - - - - - ((1212))$

其中，t_j,i和t_i表示0,1的二值指示变量，对于t_j,i,如果等于1，则表示观测轨迹T_j与T_i相连接，如果等于0，则表示不连接。对于指示变量t_i，如果等于1，表示该观测轨迹处在联合轨迹区域，反之亦然。C_j,i表示观测轨迹T_j与T_i之间的转移相似度代价值，P_sim(i)表示两段轨迹的相似度，由目标的表象特征f_i向量的欧式距离求得，C_i表示观测轨迹T_j深度代价值，其中p_i表示实际深度值。 Among them, t _j,i and t _i represent binary indicator variables of 0,1. For t _j,i , if it is equal to 1, it means that the observed trajectory T _j is connected to T _i , and if it is equal to 0, it means that it is not connected. For the indicator variable t _i , if it is equal to 1, it means that the observed track is in the joint track area, and vice versa. C _j,i represents the transfer similarity cost value between the observed trajectory T _j and T _i , P _sim (i) represents the similarity of the two trajectories, which is obtained from the Euclidean distance of the target's appearance feature f _i vector, C _i Indicates the depth cost value of the observed trajectory T _j , where p _i represents the actual depth value.

式(9)是实时跟踪模型的目标方程，本发明使用改进的二部图匹配方法来求解该问题。改进的二部图匹配方法属于本领域常用的求解算法，在此不作赘述。 Formula (9) is the target equation of real-time tracking model, and the present invention uses improved bipartite graph matching method to solve this problem. The improved bipartite graph matching method is a commonly used solution algorithm in this field, and will not be described in detail here.

实验证明，在连续的视频帧内，本发明方法可以准确地跟踪到每一个目标。根据本发明得到的检测跟踪结果如图7所示，从图7中可以看出，在不同的交通场景内，即使行人出现多次的遮挡，本发明仍然可以识别出每一个行人。 Experiments have proved that in continuous video frames, the method of the present invention can accurately track each target. The detection and tracking results obtained according to the present invention are shown in FIG. 7 . It can be seen from FIG. 7 that in different traffic scenes, even if pedestrians are occluded multiple times, the present invention can still identify each pedestrian.

本发明很好的综合了激光传感器和视觉传感器的优点，有效地提高单一传感器对行人进行检测跟踪的精确度和时效性。通过激光点滤波，聚类以及激光点模式分析完成对行人的初步定位，然后经过激光扫描仪和摄像机之间的数据配准，将行人的潜在位置投影到图像层；进而利用快速行人检测技术完成行人位置确定并进行特征提取，最终在连续多帧之间进行数据关联，达到对行人实时检测跟踪的目标。算法对周围的路面环境做出分析，并判断出可能对车辆安全行驶造成的威胁，进行危险预警，实现辅助安全驾驶的功能。 The invention well integrates the advantages of the laser sensor and the vision sensor, and effectively improves the accuracy and timeliness of a single sensor for detecting and tracking pedestrians. The preliminary positioning of pedestrians is completed through laser point filtering, clustering and laser point pattern analysis, and then the potential position of pedestrians is projected to the image layer through data registration between the laser scanner and the camera; and then the fast pedestrian detection technology is used to complete Pedestrian position is determined and feature extraction is performed, and finally data association is performed between consecutive multiple frames to achieve the goal of real-time detection and tracking of pedestrians. The algorithm analyzes the surrounding road surface environment, and judges the possible threats to the safe driving of the vehicle, carries out danger warning, and realizes the function of assisting safe driving.

以上所述的具体实施例，对本发明的目的、技术方案和有益效果进行了进一步详细说明，所应理解的是，以上所述仅为本发明的具体实施例而已，并不用于限制本发明，凡在本发明的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。 The specific embodiments described above have further described the purpose, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above descriptions are only specific embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims

1. A fast detection and tracking method for pedestrians based on information fusion, characterized in that the method may further comprise the steps:

Step S1: Use a laser scanner to scan the surrounding environment at a fixed frequency to obtain laser data;

Step S2: filtering out invalid data in the laser data to obtain candidate target data;

Step S3: Calibrate the coordinate parameters between the laser scanner and the camera to obtain coordinate conversion parameters between the two coordinate systems;

Step S4: Based on the candidate target data, confirm the candidate targets therein;

Step S5: Establish a real-time tracking model, and track the candidate targets confirmed in step S4 according to the real-time tracking model.

2. The method according to claim 1, wherein the laser scanner is a two-dimensional laser scanner.

3. method according to claim 1, it is characterized in that, let set L={l _z } represent a group of laser data, wherein, z=1,..., Z, Z represent laser data point number, then described step S2 further comprising the following steps:

First, the set L is convolved with the kernel template [-1,1], and the data points with a point spacing within a certain range are retained to obtain a rough denoising result set: C={c _n },n=1,... .N, where N represents the number of laser data points in the rough denoising result set;

Then, for the coarse denoising result set Perform a clustering operation to obtain a set of candidate target data sets S.

4. The method according to claim 3, wherein the clustering operation is specifically:

In the initial clustering, each data point is regarded as a class, and the distance between the new class and a single sample data point is calculated. If the distance between two adjacent classes c _n-1 and c _n falls within a certain predetermined threshold range, It is considered that they belong to the same class, otherwise they are considered to belong to different classes, and the current single sample data point is used as the starting point of the new class.

5. The method according to claim 1, characterized in that, in the step S3, the rotation and translation between the laser coordinate system and the camera coordinate system are obtained by using the method that the calibration plate is captured simultaneously by the laser scanner and the camera matrix.

6. The method according to claim 1, wherein said step S4 further comprises the following steps:

Step S41, dividing the training sample image into several basic units according to predetermined size pixel blocks;

Step S42, divide each adjacent m units into a block, divide the gradient direction on average, and obtain multiple basic direction bins within the direction range of 180°;

Step S43, for each basic unit, project all pixels in all basic directions to establish their respective gradient direction histograms;

Step S44, connect the gradient orientation histograms of the basic units contained in each block to obtain a vector;

Step S45, normalizing and concatenating the vectors of all the regional blocks to obtain the HOG feature vector of each training sample;

Step S46, training an SVM classification model based on the extracted HOG feature vector;

Step S47, for a certain candidate area on the training sample image in the camera coordinate system, it is judged whether there is a target in the area by scanning the multi-scale image area pixel by pixel.

7. The method according to claim 1, wherein the real-time tracking model is expressed as:

{S S}^{* *} = = \underset{S S}{arg arg max max} P P ((S S | | T T)) = = \underset{S S}{arg arg max max} P P ((T T | | S S)) P P ((S S)),,

Among them, T is a given observation vector, and S is a set of candidate targets.

8. The method according to claim 7, wherein the real-time tracking model is also provided with constraints:

Among them, T _k represents the observation vector of the kth target in the scene, and T represents the sum of the observation vectors of each target in the scene.

9. method according to claim 1, is characterized in that, the objective equation of described real-time tracking model is expressed as:

\begin{matrix} S S = = \underset{{S S}^{α α}}{arg arg min min} \underset{i i,, j j}{Σ Σ} {C C}_{j j,, i i} {t t}_{j j,, i i} + + \underset{i i}{Σ Σ} {C C}_{j j,, i i} {t t}_{j j,, i i} + + \underset{i i}{Σ Σ} {C C}_{j j} {t t}_{i i} + + \underset{{S S}^{β β}}{arg arg min min} \underset{i i,, j j}{Σ Σ} {C C}_{j j,, i i} {t t}_{j j,, i i} + + \underset{i i}{Σ Σ} {C C}_{i i} {t t}_{i i} \\ = = \underset{{S S}^{α α}}{arg arg min min} \underset{i i,, j j}{Σ Σ} {C C}_{j j,, i i} {t t}_{j j,, i i} + + \underset{i i}{Σ Σ} {C C}_{i i} {t t}_{i i} + + \underset{{S S}^{β β}}{arg arg min min} \underset{i i,, j j}{Σ Σ} {C C}_{j j,, i i} {t t}_{j j,, i i} \end{matrix},,

Among them, S ^α represents the region with joint target trajectory; S ^β represents the region of independent target trajectory; t _j,i and t _i represent binary indicator variables, Indicates the transition similarity cost value between the observed trajectory T _j and T _i , C _j,i =-logP _sim (T _jk-1 |T _ik ), C _i represents the depth cost value of the observed trajectory T _j

{C C}_{i i} = = - - log log P P (({T T}_{i i} | | S S)) - - log log \frac{{p p}_{i i}}{11 - - {p p}_{i i}} = = log log \frac{11 - - {p p}_{i i}}{{p p}_{i i}} . .