CN110866425A

CN110866425A - Pedestrian identification method based on light field camera and depth migration learning

Info

Publication number: CN110866425A
Application number: CN201810985726.3A
Authority: CN
Inventors: 石凡; 赵宇峰; 赵萌; 贾晨; 栾昊; 陈胜勇; 冯洋博
Original assignee: Tianjin University of Technology
Current assignee: Tianjin University of Technology
Priority date: 2018-08-28
Filing date: 2018-08-28
Publication date: 2020-03-06

Abstract

A pedestrian recognition method based on a light field camera and depth migration learning comprises the steps of ① obtaining a plurality of pedestrian images through the light field camera, ② obtaining color pedestrian images and depth pedestrian images through Lytro desktop software, ③ preprocessing the color images and the depth images obtained in the step ②, classifying the color images and the depth images into uniform sizes, dividing the images into positive and negative samples to obtain light field image data sets, ④ model initialization, ⑤ freezing previous convolution blocks through an existing VGG16 image classification model trained on an ImageNet data set, reserving parameters of the last convolution block to obtain initial values of a neural network, ② 0 processing the color pedestrian images and the depth pedestrian images in the step ① through the neural network in ④ to obtain mixed convolution characteristics, and ⑦ conducting repeated training of the neural network according to the convolution characteristics obtained in the step ⑥ and conducting fine tuning on the model to obtain a new classification model.

Description

Pedestrian recognition method based on light field camera and deep transfer learning

技术领域technical field

本发明属于计算机视觉领域，特别涉及一种基于光场相机和深度迁移学习的行人识别方法。The invention belongs to the field of computer vision, and particularly relates to a pedestrian recognition method based on a light field camera and deep migration learning.

背景技术Background technique

行人识别是计算机视觉研究领域的一个重要部分，在智能交通、视频监控、人工智能以及自动驾驶等领域都起到了十分重要的应用。近年来，基于计算机硬件设备和新的拍照技术的快速发展，工业界对行人识别的性能和准确度提出了更加苛刻的要求。Pedestrian recognition is an important part of the field of computer vision research, and has played a very important role in intelligent transportation, video surveillance, artificial intelligence, and autonomous driving. In recent years, based on the rapid development of computer hardware equipment and new photographing technologies, the industry has put forward more stringent requirements for the performance and accuracy of pedestrian recognition.

由于自动驾驶技术近年来的蓬勃发展，使得对于行人的识别的准确率尤为重要。由于行人具有刚性物体和非刚性物体的特性，如行人拍摄角度的多变性，光照以及遮挡等因素的存在以及在交通标志牌、街景广告标识牌中有大量人体的出现，行人的误检一直是影响行人检测性能的关键问题。因此，近年来，科研人员在行人特征的获取和检测方法优化方面开展了大量工作，综合运用多传感器的方法提取行人特征，从而降低误检率，提高行人检测率。Due to the rapid development of autonomous driving technology in recent years, the accuracy of pedestrian recognition is particularly important. Because pedestrians have the characteristics of rigid objects and non-rigid objects, such as the variability of pedestrian shooting angles, the existence of factors such as illumination and occlusion, and the appearance of a large number of human bodies in traffic signs and street view advertising signs, the false detection of pedestrians has always been a problem. Key issues affecting pedestrian detection performance. Therefore, in recent years, researchers have carried out a lot of work on the acquisition of pedestrian features and the optimization of detection methods, and comprehensively use multi-sensor methods to extract pedestrian features, thereby reducing the false detection rate and improving the pedestrian detection rate.

吴义仁在美国斯坦福大学汉拉恩教授的实验室，与其他研究员创制“光场相机”。据了解，“光场相机”机身和一般数码相机差不多，但内部结构大有不同。一般相机以主镜头捕捉光线，再聚焦在镜头后的胶片或感光器上，所有光线的总和形成相片上的小点，显示影像。这部特制相机置于主镜头及感光器之间，有一个布满9万个微型镜片的显微镜阵列，每个小镜阵列接收由主镜颈而来的光线后，传送到感光器前，析出聚焦光线及将光线资料转换，以数码方式记下。相机内置软件操作“已扩大光场”，追踪每条光线在不同距离的影像上的落点，经数码重新对焦后，便能拍出完美照片。Wu Yiren created a "light field camera" with other researchers in the laboratory of Professor Hanraen at Stanford University. It is understood that the body of the "light field camera" is similar to that of a general digital camera, but the internal structure is very different. Generally, the camera captures the light with the main lens, and then focuses it on the film or photoreceptor behind the lens. The sum of all the light forms a small dot on the photo to display the image. This special camera is placed between the main lens and the photoreceptor, and there is a microscope array covered with 90,000 microscopic lenses. Focus the light and convert the light data to record it digitally. The camera's built-in software operates an "expanded light field" that tracks where each ray of light falls on the image at different distances, and digitally refocuses it to take the perfect photo.

而且，“光场相机”一反传统，减低镜头孔径大小及景深，以小镜阵列控制额外光线，展露每个影像的景深，再将微小的次影像投射到感光器上，所有聚焦影像周围的朦胧光圈变为“清晰”，保持旧有相机的大孔径所带来的增加光度、减少拍照时间及起粒的情况，不用牺牲景深及影像清晰度。Moreover, the "light field camera" is unconventional, reducing the lens aperture size and depth of field, using a small mirror array to control the extra light, revealing the depth of field of each image, and then projecting a tiny secondary image onto the photoreceptor, all around the focused image. The hazy aperture becomes "sharp," maintaining the increased luminosity, reduced photo-taking time, and graining of the older cameras' large aperture, without sacrificing depth of field and image clarity.

与数码相机相比，光场相机有几点显著特点。Compared with digital cameras, light field cameras have several distinguishing features.

1、先拍照，再对焦：数码相机，只捕捉一个光面对焦成像，中心清晰，焦外模糊；光场相机则是记录下所有方向光束的数据，后期在电脑中根据需要选择对焦点，照片的最后成像效果要在电脑上处理完成。1. Take pictures first, then focus: digital cameras only capture a glossy focus image, with a clear center and blurry out-of-focus; light field cameras record the data of beams in all directions, and select the focus point in the computer later as needed. The final imaging effect of the photo should be processed on the computer.

2、体积小，速度快：由于采用与数码相机不同的成像技术，光场相机没有数码相机复杂的聚焦系统，整体体积较小，操作也比较简单；同时由于不用选择对焦，拍摄的速度也更快。2. Small size and fast speed: Due to the use of different imaging technologies from digital cameras, light field cameras do not have the complex focusing system of digital cameras, the overall size is small, and the operation is relatively simple; at the same time, because there is no need to select focusing, the shooting speed is also faster. quick.

目前常用的目标识别方法是基于彩色相机进行拍照获取样本并通过机器学习或者深度学习方法进行目标识别。这些方法在面对二维平面中的行人识别问题时，所体现出来的准确率和鲁棒性很差。At present, the commonly used target recognition method is to take pictures based on a color camera to obtain samples, and to perform target recognition through machine learning or deep learning methods. These methods show poor accuracy and robustness when faced with the pedestrian recognition problem in a two-dimensional plane.

基于机器学习的方法从训练样本学习人体的规律得到模型，然后在测试集上进行测试。如果能够合理地选择数据和特征，加上合理的算法来进行训练，可以很好的克服行人二维平面的误检问题。The machine learning-based method learns the laws of the human body from the training samples to obtain the model, and then tests it on the test set. If the data and features can be selected reasonably, coupled with a reasonable algorithm for training, the problem of false detection of pedestrians in two-dimensional planes can be well overcome.

基于机器学习的方法一般包括特征提取，分类器训练和检测三个部分。在行人识别领域中最常用的特征为梯度方向直方图(Histogram oforiented gradient,HOG)。HOG特征结合支持向量机(Support Vector Machine,SVM)在行人识别中的应用确实取得了不错的效果。然而， HOG是典型的手工特征，对于图像分类和识别以及任意姿态的行人、动物、建筑物等目标的检测效果并不令人满意。并且这种类似HOG的手工特征设计需要设计者具有优秀的视觉研究能力和丰富的研究经验。回顾十年目标识别的研究进程，可以发现，所提出的模型和算法都是基于特征的人工设计，而且进展缓慢。Machine learning-based methods generally include three parts: feature extraction, classifier training and detection. The most commonly used feature in the field of pedestrian recognition is the histogram of oriented gradient (HOG). The application of HOG feature combined with Support Vector Machine (SVM) in pedestrian recognition has indeed achieved good results. However, HOG is a typical handcrafted feature, which is not satisfactory for image classification and recognition and detection of pedestrians, animals, buildings and other objects with arbitrary poses. And this kind of HOG-like manual feature design requires the designer to have excellent visual research ability and rich research experience. Looking back at the research progress of object recognition in ten years, it can be found that the proposed models and algorithms are all artificially designed based on features, and the progress is slow.

近些年来，随着科学技术的发展，深度学习已然成为计算机视觉领域中最热门的研究方向之一。对于图像识别任务以及诸如检测、分割等其他任务，各研究团队在后来的研究中也取得了非常不错的成绩。所以，就目前的形势来看，将深度学习应用于行人识别已成为一种趋势，并且具有非常广阔的研究意义和应用前景。In recent years, with the development of science and technology, deep learning has become one of the most popular research directions in the field of computer vision. For image recognition tasks and other tasks such as detection, segmentation, etc., various research teams have also achieved very good results in subsequent research. Therefore, according to the current situation, the application of deep learning to pedestrian recognition has become a trend, and has very broad research significance and application prospects.

发明内容SUMMARY OF THE INVENTION

本发明针对现有行人识别方法的不足加以改进，提供了一种基于光场相机和深度迁移学习的行人识别方法，使用该方法可有效地提高行人检测方法的准确率和鲁棒性。The invention improves the shortcomings of the existing pedestrian recognition methods, and provides a pedestrian recognition method based on a light field camera and deep migration learning, and the accuracy and robustness of the pedestrian detection method can be effectively improved by using the method.

如上构思，本发明的技术方案是：一种基于光场相机和深度迁移学习的行人识别方法，其特征在于：包括如下步骤：As conceived above, the technical solution of the present invention is: a pedestrian recognition method based on a light field camera and deep transfer learning, characterized in that it includes the following steps:

①利用光场相机获取多幅行人图像；①Using a light field camera to acquire multiple pedestrian images;

②利用Lytro desktop软件将步骤①获得的原始行人图像进行处理，得到彩色行人图像和深度行人图像；②Using the Lytro desktop software to process the original pedestrian image obtained in step ① to obtain a color pedestrian image and a deep pedestrian image;

③将步骤②得到的彩色图像和深度图像进行预处理并归于化为统一尺寸，并将图像分为正负样本，得到光场图像数据集；(3) Preprocess the color image and depth image obtained in step (2) and reduce it to a uniform size, and divide the image into positive and negative samples to obtain a light field image dataset;

④模型初始化：基于迁移学习采用“逐步迁移的”的策略微调fine-tuning实现；④Model initialization: fine-tuning is implemented by fine-tuning a "step-by-step" strategy based on transfer learning;

⑤利用已有的在ImageNet数据集上训练好的VGG16图像分类模型，冻结其前面的卷积块，保留最后一个卷积块的参数，得到神经网络的初始值；⑤Using the existing VGG16 image classification model trained on the ImageNet dataset, freeze the previous convolution block, retain the parameters of the last convolution block, and obtain the initial value of the neural network;

⑥将步骤①中彩色行人图像和深度行人图像分别经过④中的神经网络处理之后得到混合卷积特征；⑥ The color pedestrian image and the depth pedestrian image in step ① are respectively processed by the neural network in ④ to obtain the mixed convolution feature;

⑦根据⑥中得到的卷积特征进行神经网络的反复训练并进行模型的微调，得到一个新的分类模型。⑦ According to the convolution features obtained in ⑥, the neural network is repeatedly trained and the model is fine-tuned to obtain a new classification model.

所述迁移学习采用Keras框架。The transfer learning adopts the Keras framework.

本发明具有如下的优点和积极效果：The present invention has the following advantages and positive effects:

1、本发明对于二维平面行人误识别确实起到了较好的作用。作为一种对现有行人识别方法的补充，该方法运算量小、所需数据量小、对机器硬件要求低，可以应用到实际的工业环境中去。1. The present invention has indeed played a good role in the misidentification of pedestrians on a two-dimensional plane. As a supplement to the existing pedestrian recognition method, the method has a small amount of computation, a small amount of data required, and low requirements on machine hardware, and can be applied to actual industrial environments.

2、本发明可有效地提高行人检测方法的准确率和鲁棒性。2. The present invention can effectively improve the accuracy and robustness of the pedestrian detection method.

附图说明Description of drawings

图1是本发明基于光场相机和深度迁移学习的行人识别方法的流程图。FIG. 1 is a flowchart of a pedestrian recognition method based on a light field camera and deep transfer learning of the present invention.

图2是本发明所用深度卷积网络结构示意图。FIG. 2 is a schematic structural diagram of a deep convolutional network used in the present invention.

具体实施方式Detailed ways

下面结合具体实施例，进一步阐明本发明，应理解这些实施例仅用于说明本发明而不用于限制本发明的范围，在阅读了本发明之后，本领域技术人员对本发明的各种等价形式的修改均落于本申请所附权利要求所限定的范围。Below in conjunction with specific embodiments, the present invention will be further illustrated, and it should be understood that these embodiments are only used to illustrate the present invention and not to limit the scope of the present invention. The modifications all fall within the scope defined by the appended claims of this application.

本发明的光场相机采用Lytro公司生产的光场相机。所采用的实验平台：PC。The light field camera of the present invention adopts the light field camera produced by Lytro Company. The experimental platform used: PC.

如图1所示：一种基于光场相机和深度迁移学习的行人识别方法，包括如下步骤：As shown in Figure 1: a pedestrian recognition method based on light field camera and deep transfer learning, including the following steps:

④模型初始化：基于迁移学习采用“逐步迁移的”的策略微调fine-tuning实现；所谓“微调”就是利用以及训练好的模型来初始化目标网络的参数，在此基础上继续训练，其目的是为了得到一个不错的神经网络初始值。④Model initialization: Based on migration learning, fine-tuning is implemented by adopting a "step-by-step migration" strategy; the so-called "fine-tuning" is to use the trained model to initialize the parameters of the target network, and continue training on this basis. The purpose is to Get a nice initial value for the neural network.

至此，本发明所提供的基于光场相机和深度学习的行人识别方法完成。So far, the pedestrian recognition method based on the light field camera and deep learning provided by the present invention is completed.

如图1、2所示：本发明具体采用的步骤是：As shown in Figure 1, 2: the step that the present invention specifically adopts is:

1、首先利用Lytro光场相机拍摄多幅包含行人场景的图片，随后利用Lytro官方软件对这些图片进行处理，得到行人的彩色图像和深度图像。1. First, use the Lytro light field camera to take multiple pictures containing pedestrian scenes, and then use the Lytro official software to process these pictures to obtain color images and depth images of pedestrians.

2、将步骤1中得到的彩色图像和深度图像进行预处理并归于化为统一尺寸，并将图像分为正负样本各500个，得到光场图像数据集。2. Preprocess the color image and depth image obtained in step 1 and reduce it to a uniform size, and divide the image into 500 positive and negative samples to obtain a light field image dataset.

3、下载现有的在ImageNet数据集上训练好的VGG16图像分类模型，并适应本发明所用的Keras框架的格式。3. Download the existing VGG16 image classification model trained on the ImageNet dataset, and adapt to the format of the Keras framework used in the present invention.

4、利用步骤3种处理好的模型参数，冻结前4个卷积块的参数，第5个卷积块中的参数解冻，参与神经网络训练并进行更新。4. Using the model parameters processed in step 3, freeze the parameters of the first 4 convolution blocks, unfreeze the parameters in the fifth convolution block, participate in the neural network training and update.

5、在进行神经网络的训练之前，需要对数据集进行数据扩增的操作。主要包括颜色通道偏移，图像旋转，镜像，随机裁剪等。利用Keras深度学习框架，通过其内置函数可以很方便的进行上述图像处理的操作。5. Before training the neural network, the data set needs to be augmented. Mainly include color channel offset, image rotation, mirroring, random cropping, etc. Using the Keras deep learning framework, the above image processing operations can be easily performed through its built-in functions.

6、利用步骤4中的参数，将彩色图像经过VGG16的所有卷积层，得到特征向量X1。6. Using the parameters in step 4, pass the color image through all convolutional layers of VGG16 to obtain the feature vector X1.

7、将深度图像输入卷积核大小为3*3的1个卷积层，经过卷积操作之后得到特征向量 X2。7. Input the depth image into a convolutional layer with a convolution kernel size of 3*3, and obtain the feature vector X2 after the convolution operation.

8、将步骤6和步骤7中的卷积特征加权加和起来形成新的卷积特征，即X＝W1*X1+W2*X2， X为新的卷积特征。8. The convolution features in step 6 and step 7 are weighted and added together to form a new convolution feature, that is, X=W1*X1+W2*X2, where X is a new convolution feature.

9、将步骤8中得到的卷积特征X输入全连接层(Fully Connected Layer)，得到最后的分类结果。9. Input the convolution feature X obtained in step 8 into the Fully Connected Layer to obtain the final classification result.

10、根据分类的结果，能够减少行人识别的误识率尤其是行人数据集中包含二维平面中行人的情况。10. According to the classification result, the misrecognition rate of pedestrian recognition can be reduced, especially when the pedestrian dataset contains pedestrians in a two-dimensional plane.

本发明需要将神经网络的迭代次数应设置为50次，优化器设置为Momentum。In the present invention, the number of iterations of the neural network should be set to 50, and the optimizer should be set to Momentum.

如上述，已经清楚详细地描述了本发明提出的一种对于二维平面中行人误识别的检测方法。As above, a detection method for pedestrian misidentification in a two-dimensional plane proposed by the present invention has been described clearly and in detail.

尽管已经示出和描述了本发明的实施例，本领域的普通技术人员可以理解：在不脱离本发明的原理和宗旨的情况下可以对这些实施例进行多种变化、修改、替换和变型，本发明的范围由权利要求及其等同物限定。Although embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, The scope of the invention is defined by the claims and their equivalents.

Claims

1. a pedestrian recognition method based on light field camera and deep transfer learning, is characterized in that: comprise the steps:

①Using a light field camera to acquire multiple pedestrian images;

②Using the Lytro desktop software to process the original pedestrian image obtained in step ① to obtain a color pedestrian image and a deep pedestrian image;

(3) Preprocess the color image and depth image obtained in step (2) and reduce it to a uniform size, and divide the image into positive and negative samples to obtain a light field image dataset;

④Model initialization: fine-tuning is implemented by fine-tuning a "step-by-step" strategy based on transfer learning;

⑤Using the existing VGG16 image classification model trained on the ImageNet dataset, freeze the previous convolution block, retain the parameters of the last convolution block, and obtain the initial value of the neural network;

⑥ The color pedestrian image and the depth pedestrian image in step ① are respectively processed by the neural network in ④ to obtain the mixed convolution feature;

⑦ According to the convolution features obtained in ⑥, the neural network is repeatedly trained and the model is fine-tuned to obtain a new classification model.

2 . The pedestrian recognition method based on a light field camera and deep transfer learning according to claim 1 , wherein the transfer learning adopts a Keras framework. 3 .