CN109446920B

CN109446920B - Detection method of urban rail transit passenger congestion based on convolutional neural network

Info

Publication number: CN109446920B
Application number: CN201811162491.4A
Authority: CN
Inventors: 张宁; 陈毓伟; 何铁军; 裴顺鑫; 黎庆; 王健; 李勇; 汪理; 孙舒淼; 娄永梅; 陈亮; 吴昊
Original assignee: Nanjing Metro Construction Co ltd; Nanjing Metro Group Co ltd; Southeast University; CRSC Research and Design Institute Group Co Ltd
Current assignee: Nanjing Metro Construction Co ltd; Nanjing Metro Group Co ltd; Southeast University; CRSC Research and Design Institute Group Co Ltd
Priority date: 2018-09-30
Filing date: 2018-09-30
Publication date: 2019-08-06
Anticipated expiration: 2038-09-30
Also published as: CN109446920A

Abstract

The invention discloses a method for detecting passenger congestion in urban rail transit based on a convolutional neural network. First, the video to be detected is preprocessed, segmented and the motion residual image is extracted, and the original image and the motion residual image are combined as a convolution The input of the neural network algorithm is to establish a feature extraction block including at least one convolutional layer and a maximum pooling layer, process and calculate the crowd state features contained in the original image and the motion residual image, and then combine the crowd state features with the motion features, Construct a feature fusion block that includes at least one convolutional layer, a maximum pooling layer, and a fully connected layer, perform fusion processing, and construct a classifier at the same time, use a prefabricated training set with a crowded degree label to train the convolutional neural network, so that The classifier correctly detects the degree of passenger congestion in the video to be tested, more comprehensively characterizes the passenger flow situation in the surveillance video, realizes the detection of the degree of congestion, and improves the accuracy of algorithm detection.

Description

Detection method of urban rail transit passenger congestion based on convolutional neural network

所属领域Field

本发明属于轨道交通运输技术领域，具体涉及基于卷积神经网络的城市轨道交通乘客拥挤程度检测方法。The invention belongs to the technical field of rail transit transportation, and in particular relates to a method for detecting passenger congestion degree of urban rail transit based on a convolutional neural network.

背景技术Background technique

伴随着城市化进程的不断加快和轨道交通线网格局逐渐完善，城市轨道交通成为城市公共交通的主要承担者。快速增长客流对日常的运营管理水平提出了更高的要求。一方面，为了制定合理的行车计划和客流组织方案，高效率的利用运营资源和满足大规模线网上快速变化的乘客出行需求，需要精确的掌握客流状态和客流数据；另一方面，由于轨道交通车站通常位于封闭的地下或轨道高架，站厅面积较为有限，在客流高峰到来时，大量的客流涌入极易造成站厅拥挤和通道堵塞。大密度的人群不仅造成客流疏导困难，更容易引起大规模的群体安全事故，造成不良的社会影响。因此需要一种便捷、高效的方法实时获取客流分布、监测车站客流状态，为客流组织提供强有力的技术支持，保障乘客的安全和轨道交通的正常运营。With the continuous acceleration of the urbanization process and the gradual improvement of the rail transit network pattern, urban rail transit has become the main undertaker of urban public transportation. The rapid growth of passenger flow puts forward higher requirements on the level of daily operation and management. On the one hand, in order to formulate a reasonable traffic plan and passenger flow organization plan, efficiently utilize operating resources and meet the rapidly changing travel needs of passengers on a large-scale online network, it is necessary to accurately grasp the passenger flow status and passenger flow data; on the other hand, due to rail transit The station is usually located in a closed underground or elevated track, and the area of the station hall is relatively limited. When the peak passenger flow arrives, the influx of a large number of passengers will easily cause congestion in the station hall and channel blockage. The large-density crowd not only makes it difficult to guide the passenger flow, but also easily causes large-scale group safety accidents, causing adverse social impacts. Therefore, a convenient and efficient method is needed to obtain the passenger flow distribution in real time, monitor the passenger flow status of the station, provide strong technical support for passenger flow organization, and ensure the safety of passengers and the normal operation of rail transit.

现有的城市轨道交通车站普遍安装有完备的视频监控系统，监控视频的内容清晰反映了监控范围内的客流拥挤程度，包含大量有效的客流信息。以往由于技术的局限性，视频图像中信息的获取速度和精确程度难以满足实际应用的需求。伴随着图像处理技术、机器学习以及计算机性能等领域的不断发展，智能的图像识别技术应运而生。通过将图像识别技术与公共场合安装的视频监控系统结合起来，利用计算机对监控摄像头获取图像中包含的乘客目标进行处理，对客流状态进行自动的检测、研判，在发现异常目标和异常场景时及时发出警报，实现对城市轨道交通客流拥挤程度自动化、智能化的检测监控。Existing urban rail transit stations are generally equipped with a complete video monitoring system. The content of the monitoring video clearly reflects the degree of passenger flow congestion within the monitoring range, and contains a large amount of effective passenger flow information. In the past, due to technical limitations, the acquisition speed and accuracy of information in video images were difficult to meet the needs of practical applications. With the continuous development of image processing technology, machine learning and computer performance, intelligent image recognition technology has emerged. By combining the image recognition technology with the video surveillance system installed in public places, the computer is used to process the passenger targets contained in the images captured by the surveillance cameras, automatically detect and judge the passenger flow status, and find abnormal targets and abnormal scenes in time. Send out an alarm to realize automatic and intelligent detection and monitoring of passenger flow congestion in urban rail transit.

由于人群存在明显的视觉特征，因此早期的人群密度估计方法大多通过提取人群聚集的后的视觉特征来反映人群的拥挤程度，这类方法包括两大类：基于像素和基于纹理。基于像素的检测方法原理简单，在人群密度适中的场景取得了较好的识别效果，但在车站、商场等人流密集的场所，由于人群密度过大导致严重的相互遮挡，性能下降较为明显；基于纹理类特征在实时性上表现不佳，两种方法在实际操作和运用中都不甚如意，因此，需要一种性能更佳的方法来实现城市轨道交通客流拥挤程度的检测。Due to the obvious visual features of the crowd, most of the early crowd density estimation methods reflect the crowding degree by extracting the visual features of the crowd gathering. These methods include two categories: pixel-based and texture-based. The principle of the pixel-based detection method is simple, and it has achieved good recognition results in scenes with moderate crowd density. However, in crowded places such as stations and shopping malls, due to the serious mutual occlusion caused by excessive crowd density, the performance decline is more obvious; based on Texture features do not perform well in real-time performance, and the two methods are not satisfactory in actual operation and application. Therefore, a method with better performance is needed to realize the detection of passenger flow congestion in urban rail transit.

发明内容Contents of the invention

本发明正是针对现有技术中的问题，提供了一种基于卷积神经网络的城市轨道交通乘客拥挤程度检测方法，克服了现有技术中测量误差大，实时性不强的问题，以视频监控系统采集到的监控视频为基础，利用卷积神经网络强大的图像识别能力，通过构建多级的卷积神经网络提取人群图像和运动残差图像的混合特征，更加全面的表征监控视频中的客流状况，实现拥挤程度的检测，提高了算法检测的准确率。The present invention is aimed at the problems in the prior art, and provides a method for detecting passenger congestion in urban rail transit based on a convolutional neural network, which overcomes the problems of large measurement errors and poor real-time performance in the prior art. Based on the surveillance video collected by the surveillance system, using the powerful image recognition ability of the convolutional neural network, the mixed features of the crowd image and the motion residual image are extracted by constructing a multi-level convolutional neural network, so as to more comprehensively characterize the image in the surveillance video. Passenger flow status, to achieve the detection of congestion, improve the accuracy of algorithm detection.

为了实现上述目的，本发明采用的技术方案是：基于卷积神经网络的城市轨道交通乘客拥挤程度检测方法，包括如下步骤：In order to achieve the above object, the technical scheme that the present invention adopts is: the urban rail transit passenger crowding degree detection method based on convolutional neural network, comprises the steps:

S1，获取交通监控待检测视频，对待检测视频进行预处理，分段并提取运动残差图像；S1, acquiring traffic monitoring video to be detected, preprocessing the video to be detected, segmenting and extracting motion residual images;

S2，将原始图像与运动残差图像组合作为卷积神经网络算法的输入，建立至少包含一个卷积层和最大池化层的特征提取块，处理原始图像和运动残差图像，分别计算原始图像和运动残差图像中包含的人群状态特征；S2, combine the original image and the motion residual image as the input of the convolutional neural network algorithm, establish a feature extraction block including at least one convolutional layer and a maximum pooling layer, process the original image and the motion residual image, and calculate the original image respectively and the crowd state features contained in the motion residual image;

S3，将人群状态特征和运动特征结合，构建至少包含一个卷积层、最大池化层和全连接层的特征融合块，对步骤S2中获取的特征图进行融合处理：S3. Combining crowd state features with motion features, constructing a feature fusion block including at least one convolutional layer, a maximum pooling layer, and a fully connected layer, and performing fusion processing on the feature map obtained in step S2:

S4，构建包含拥挤程度等级的分类器；S4, constructing a classifier including a degree of congestion level;

S5，使用预制的带有拥挤程度标签的训练集对卷积神经网络进行训练，使分类器对待测视频中的乘客拥挤程度进行正确检测。S5, using the prefabricated training set with the degree of congestion label to train the convolutional neural network, so that the classifier can correctly detect the degree of passenger congestion in the video to be tested.

作为本发明的一种改进，所述步骤S1进一步包括，As an improvement of the present invention, the step S1 further includes,

S11，获取交通监控待检测视频；S11, acquiring traffic monitoring video to be detected;

S12，设定检测周期T，依据检测周期T将待检测视频划分为长度为T的视频片段，所述视频片段的第一帧图像为基准图像；S12, setting the detection cycle T, dividing the video to be detected into video segments with a length T according to the detection cycle T, and the first frame image of the video segment is a reference image;

S13，取视频片段中其他图像，分别与基准图像做差，得到运动残差图像。S13. Taking other images in the video segment and making differences with the reference image respectively to obtain a motion residual image.

作为本发明的一种改进，所述步骤S2中各个特征提取块的数量为1，所述卷积层与与池化层的连接方式为交替连接。As an improvement of the present invention, the number of each feature extraction block in the step S2 is 1, and the connection mode of the convolution layer and the pooling layer is alternate connection.

作为本发明的另一种改进，所述步骤S3中特征融合块以特征提取块输出的特征图为输入，所述特征融合块数量为1。As another improvement of the present invention, the feature fusion block in the step S3 uses the feature map output by the feature extraction block as input, and the number of the feature fusion blocks is one.

作为本发明的另一种改进，所述步骤S3中全连接层数量为3，且为最后三层，所述卷积层与池化层的连接方式为交替连接，且均位于全连接层前。As another improvement of the present invention, the number of fully connected layers in the step S3 is 3, and it is the last three layers, and the connection mode of the convolutional layer and the pooling layer is alternately connected, and they are all located in front of the fully connected layer .

作为本发明的又一种改进，所述步骤S4中分类器等级分为十层，分别为宽敞：0-2级；舒适：3-5级；拥挤：5-8级；危险：9-10级。As another improvement of the present invention, in the step S4, the grades of the classifiers are divided into ten layers, which are spacious: 0-2; comfortable: 3-5; crowded: 5-8; dangerous: 9-10 class.

作为本发明的更进一步改进，所述步骤S5对卷积神经网络进行训练中，采用随机梯度下降算法对卷积神经网络中的参数进行修正，所述随机梯度下降法公式如下：As a further improvement of the present invention, during the training of the convolutional neural network in step S5, a stochastic gradient descent algorithm is used to correct parameters in the convolutional neural network, and the formula of the stochastic gradient descent method is as follows:

g(θ)＝∑θxⁱ g(θ)=∑θx ⁱ

θ_m＝θ_m-1-η▽_θh(θ)θ _m ＝θ _m-1 -η▽ _θ h(θ)

其中，g(θ)表示网络假设函数，θ表示卷积神经网络的参数权值，h(θ)表示损失函数，yⁱ表示第i个样本的样本值，m表示算法迭代的总次数，σ表示惩罚系数，▽_θ表示梯度，η表示梯度下降中的学习率。Among them, g(θ) represents the network hypothesis function, θ represents the parameter weight of the convolutional neural network, h(θ) represents the loss function, y ⁱ represents the sample value of the i-th sample, m represents the total number of algorithm iterations, σ Represents the penalty coefficient, ▽ _θ represents the gradient, and η represents the learning rate in gradient descent.

作为本发明的更进一步改进，所述步骤S5中，当检测的结果低于实际拥挤程度时，惩罚系数σ＝1+log(yⁱ-g_θ(xⁱ))，否则σ＝1。As a further improvement of the present invention, in the step S5, when the detected result is lower than the actual degree of congestion, the penalty coefficient σ=1+log(y ⁱ −g _θ ( ^xi )), otherwise σ=1.

与现有技术相比，本发明提出了一种基于卷积神经网络的城市轨道交通乘客拥挤程度检测方法，具有的有益效果是：基于卷积神经网络的轨道交通客流，拥挤程度检测方法结合视频监控系统采集到的图像，检测结果能够实时、有效的反映监控范围内实际客流状况；拥挤程度的划分标准结合实际场景，符合使用地的运营需求；实时的检测能够最大程度的减少人工查看监控视频的工作量并指导客流管控工作，提高城市轨道交通运营的安全性和服务质量。Compared with the prior art, the present invention proposes a method for detecting passenger congestion in urban rail transit based on a convolutional neural network. The images and detection results collected by the monitoring system can reflect the actual passenger flow conditions within the monitoring area in real time and effectively; the division standard of the degree of congestion is combined with the actual scene to meet the operational needs of the place of use; real-time detection can minimize manual viewing of monitoring video workload and guide the passenger flow control work to improve the safety and service quality of urban rail transit operations.

附图说明Description of drawings

图1为本发明的检测方法步骤示意图；Fig. 1 is a schematic diagram of detection method steps of the present invention;

图2为本发明实施例2中卷积层及最大池化层的连接方式示意图；Fig. 2 is a schematic diagram of the connection mode of the convolutional layer and the maximum pooling layer in Embodiment 2 of the present invention;

图3为本发明实施例3特征融合块中卷积层、最大池化层和全连接层的连接方式。Fig. 3 is a connection mode of the convolutional layer, the maximum pooling layer and the fully connected layer in the feature fusion block according to Embodiment 3 of the present invention.

具体实施方式Detailed ways

以下将结合附图和实施例，对本发明进行较为详细的说明。The present invention will be described in detail below with reference to the accompanying drawings and embodiments.

实施例1Example 1

基于卷积神经网络的城市轨道交通乘客拥挤程度检测方法，如图1所示，包括如下步骤：The detection method of urban rail transit passenger congestion based on convolutional neural network, as shown in Figure 1, includes the following steps:

S12，设定检测周期T，依据检测周期T将待检测视频划分为长度为T的视频片段，所述视频片段的第一帧图像为基准图像，记为p₁；S12, setting the detection period T, according to the detection period T, the video to be detected is divided into video segments whose length is T, and the first frame image of the video segment is a reference image, denoted as p ₁ ;

S13，取视频片段中其他图像，例如分别取检测单元中1/3T，2/3T和T处的一帧图像，记为p₂，p₃，p₄，将p₂，p₃，p₄分别与基准图像p₁做差，获取待检测视频片段中人群的运动残差图像，记为p₁₂，p₁₃，p₁₄。S13, take other images in the video segment, for example take a frame of images at 1/3T, 2/3T and T in the detection unit respectively, denoted as p ₂ , p ₃ , p ₄ , set p ₂ , p ₃ , p ₄ Make difference with the reference image p ₁ respectively to obtain the motion residual images of the crowd in the video segment to be detected, denoted as p ₁₂ , p ₁₃ , p ₁₄ .

S2，将基准图像p₁与运动残差图像p₁₂,p₁₃,p₁₄组合成一组作为乘客拥挤程度卷积神经网络算法的输入，针对输入的基准图像p₁和运动残差图像p₁₂，p₁₃，p₁₄，构建第一、第二、第三、第四特征提取块对应处理输入的四幅图像，计算基准图像中包含的人群特征和残差图像中包含的运动特征，所述特征提取块至少包含一个卷积层和最大池化层；S2, combining the reference image p ₁ and the motion residual images p ₁₂ , p ₁₃ , p ₁₄ into a group as the input of the passenger congestion degree convolutional neural network algorithm, for the input reference image p ₁ and the motion residual image p ₁₂ , p ₁₃ , p ₁₄ , build the first, second, third, and fourth feature extraction blocks corresponding to the four input images, calculate the crowd features contained in the reference image and the motion features contained in the residual image, the feature extraction A block contains at least one convolutional layer and a max pooling layer;

第一、第二、第三、第四特征提取块为卷积神经网络的输入，第一、第二、第三、第四处理块的数量为1，所述卷积层与最大池化层的连接方式为交替连接。The first, second, third, and fourth feature extraction blocks are the input of the convolutional neural network, and the number of the first, second, third, and fourth processing blocks is 1, and the convolution layer and the maximum pooling layer The connection method is alternate connection.

S3，将人群状态特征和运动特征结合，构建至少包含一个卷积层、最大池化层和全连接层的特征融合块，对步骤S2中获取的特征图进行融合处理，所述融合方法是将第一、第二、第三、第四处理块输出的特征图进行数量上的相加，再输入到融合块中进行卷积运算。S3, combining crowd state features and motion features, constructing a feature fusion block comprising at least one convolutional layer, a maximum pooling layer and a fully connected layer, and performing fusion processing on the feature map obtained in step S2, the fusion method is to The feature maps output by the first, second, third, and fourth processing blocks are added quantitatively, and then input into the fusion block for convolution operation.

特征融合块以第一、第二、第三、第四特征提取块输出的特征图为输入，特征融合块数量为1，全连接层数量为3，且为最后三层，所述卷积层与池化层的连接方式为交替连接，且均位于全连接层前，连接形式为卷积层-最大池化层-卷积层-最大池化层……全连接层-全连接层-全连接层，具体如下表：The feature fusion block takes the feature map output by the first, second, third, and fourth feature extraction blocks as input, the number of feature fusion blocks is 1, the number of fully connected layers is 3, and it is the last three layers. The convolutional layer The connection method with the pooling layer is alternate connection, and they are all located in front of the fully connected layer. Connection layer, as shown in the following table:

S4，构建包含拥挤程度等级的分类器，所述分类器等级分为十层，分别为宽敞：0-2级；舒适：3-5级；拥挤：5-8级；危险：9-10级，使卷积神经网络具备对待检测视频中包含的乘客拥挤程度进行划分的能力。S4, constructing a classifier including levels of crowding, the classifier levels are divided into ten layers, respectively spacious: 0-2 levels; comfortable: 3-5 levels; crowded: 5-8 levels; dangerous: 9-10 levels , so that the convolutional neural network has the ability to divide the degree of passenger congestion contained in the video to be detected.

S5，使用预制的带有拥挤程度标签的训练集对卷积神经网络进行训练，并根据改良的梯度下降方法更新网络中的参数，最后待检测视频输入完成训练的卷积神经网络，使分类器对图像中的乘客拥挤程度做出合理的判断检测。S5, use the prefabricated training set with the degree of congestion label to train the convolutional neural network, and update the parameters in the network according to the improved gradient descent method, and finally the convolutional neural network that is trained by the video input to be detected makes the classifier Make a reasonable judgment on the degree of passenger congestion in the image.

实施例2Example 2

本实施例与实施例1的不同之处在于：所述步骤S5对卷积神经网络进行训练中，采用随机梯度下降算法对卷积神经网络中的参数进行修正，所述随机梯度下降法公式如下：The difference between this embodiment and Embodiment 1 is that: in step S5 to train the convolutional neural network, the stochastic gradient descent algorithm is used to correct the parameters in the convolutional neural network, and the formula of the stochastic gradient descent method is as follows :

g(θ)＝∑θxⁱ g(θ)=∑θx ⁱ

θ_m＝θ_m-1-η▽_θh(θ)θ _m ＝θ _m-1 -η▽ _θ h(θ)

其中，g(θ)表示网络假设函数，θ表示卷积神经网络的参数权值，h(θ)表示损失函数，yⁱ表示第i个样本的样本值，m表示算法迭代的总次数，σ表示惩罚系数，▽_θ表示梯度，η表示梯度下降中的学习率；Among them, g(θ) represents the network hypothesis function, θ represents the parameter weight of the convolutional neural network, h(θ) represents the loss function, y ⁱ represents the sample value of the i-th sample, m represents the total number of algorithm iterations, σ Represents the penalty coefficient, ▽ _θ represents the gradient, and η represents the learning rate in gradient descent;

本方法涉及的卷积神经网络结构参数更新方法采用改良的随机梯度下降方法，改良的随机梯度下降法与传统方法相比，结合实际运营需求在损失函数中增加一项惩罚系数。在实际运营状况中，当拥挤程度的检测结果低于实际拥挤程度，可能导致后续管理措施的延迟，进而影响城市轨道交通的正常运营，其不良影响远大于检测结果高于实际拥挤程度的状况。因此，为减少低于实际拥挤程度的检测结果出现的次数，在损失函数中添加一项惩罚系数，使得低于实际拥挤程度的检测结果出现时，参数的调整幅度增大。当检测的结果低于实际拥挤程度时，σ＝1+log(yⁱ-g_θ(xⁱ))，其余情况下，否则σ＝1。The convolutional neural network structure parameter update method involved in this method adopts an improved stochastic gradient descent method. Compared with the traditional method, the improved stochastic gradient descent method adds a penalty coefficient to the loss function in combination with actual operational requirements. In actual operating conditions, when the detection result of the congestion level is lower than the actual congestion level, it may lead to the delay of subsequent management measures, which in turn affects the normal operation of urban rail transit, and its adverse impact is far greater than that of the situation where the detection result is higher than the actual congestion level. Therefore, in order to reduce the number of detection results that are lower than the actual degree of congestion, a penalty coefficient is added to the loss function, so that when the detection results are lower than the actual degree of congestion, the adjustment range of the parameters increases. When the detected result is lower than the actual degree of congestion, σ=1+log(y ⁱ −g _θ ( ^xi )), otherwise σ=1.

实施例3Example 3

Step1：设定检测周期T，本实施例以T＝3sec为例，依据检测周期T将待检测视频划分为长度为T的视频片段，以下称为检测单元。取检测单元的第一帧图像作为基准图像，记为p₁；取检测单元中t＝1s，t＝2s和t＝3s处的图像，记为p₂，p₃，p₄；将p₂，p₃，p₄分别与基准图像p₁做差，获取检测单元中人群的运动残差图像，记为p₁₂，p₁₃，p₁₄；将基准图像p₁与运动残差图像p₁₂，p₁₃，p₁₄组合成一组作为乘客拥挤程度检测算法的输入；根据城市轨道交通实际运营中的情况，拥挤程度被划分为10个等级宽敞{(0-2级)，舒适(3-5级)，拥挤(5-8级)，危险(9-10级)}，截取监控视频在不同拥挤程度下的图像，添加表示拥挤程度的标签，作为神经网络的训练集；Step 1: Set the detection period T. In this embodiment, T=3 sec is taken as an example, and the video to be detected is divided into video segments with a length T according to the detection period T, which is hereinafter referred to as a detection unit. Take the first frame image of the detection unit as the reference image, denoted as p ₁ ; take the images at t=1s, t=2s and t=3s in the detection unit, denote as p ₂ , p ₃ , p ₄ ; denote p ₂ , p ₃ , p ₄ are respectively compared with the reference image p ₁ to obtain the motion residual image of the crowd in the detection unit, denoted as p ₁₂ , p ₁₃ , p ₁₄ ; the reference image p ₁ and the motion residual image p ₁₂ , p ₁₃ and p ₁₄ are combined into a group as the input of passenger congestion degree detection algorithm; according to the actual operation of urban rail transit, the congestion degree is divided into 10 levels: spacious {(0-2 level), comfortable (3-5 level ), crowding (level 5-8), danger (level 9-10)}, intercept images of surveillance video under different levels of crowding, add a label representing the level of crowding, as a training set for the neural network;

Step2:针对输入的基准图像p₁和运动残差图像p₁₂，p₁₃，p₁₄，构建第一、第二、第三、第四处理快对应处理输入的四幅图像，计算基准图像中包含的人群特征和残差图像中包含的运动特征，上述处理块均包含4个卷积层和最大池化层，连接形式如图2所示；假设输入图像为224*224的彩色图像经过卷积层C1，使用11*11*3的卷积核对其进行卷积运算，卷积步长为4，生成48层55*55像素的特征图；经过最大池化层MP1，池化运算尺度为3*3，步长为2，池化后的生成96层27*27像素的特征图；经过卷积层C2和最大池化层MP2生成256层13*13的特征图；经过卷积层C3和最大池化层MP3生成384层13*13的特征图；经过卷积层C4和最大池化层MP4生成384层13*13的特征图；Step2: For the input reference image p ₁ and the motion residual image p ₁₂ , p ₁₃ , p ₁₄ , construct the first, second, third, and fourth processing blocks corresponding to the four input images, and calculate the The crowd features and the motion features contained in the residual image, the above processing blocks all include 4 convolutional layers and the maximum pooling layer, the connection form is shown in Figure 2; assume that the input image is a 224*224 color image that passes through the convolutional layer C1, use the convolution kernel of 11*11*3 to perform convolution operation, the convolution step size is 4, and generate a feature map of 48 layers of 55*55 pixels; after the maximum pooling layer MP1, the pooling operation scale is 3* 3. The step size is 2. After pooling, a feature map of 96 layers of 27*27 pixels is generated; a feature map of 256 layers of 13*13 is generated through the convolution layer C2 and the maximum pooling layer MP2; after the convolution layer C3 and the maximum The pooling layer MP3 generates a 384-layer 13*13 feature map; the convolutional layer C4 and the maximum pooling layer MP4 generate a 384-layer 13*13 feature map;

Step3:构建特征融合块对第一、第二、第三、第四处理块输出的四组13*13*384包含人群特征和运动特征的特征图进行融合处理。所述的特征融合块包含一个卷积层、最大池化层和3个全连接层，连接如图3所示；其中，卷积层C5的卷积核尺寸为3*3，卷积步长为1，卷积运算后得到四组13*13*256的特征图；再通过池化尺度为3*3，步长为2的最大池化层MP5，得到4组6*6*256的特征图；再通过三层具有4096的神经元的全连接层得到4096个输出。Step3: Build a feature fusion block to fuse the four sets of 13*13*384 feature maps containing crowd features and motion features output by the first, second, third, and fourth processing blocks. The feature fusion block includes a convolutional layer, a maximum pooling layer and 3 fully connected layers, and the connections are as shown in Figure 3; wherein, the convolutional kernel size of the convolutional layer C5 is 3*3, and the convolution step size is It is 1, and four groups of 13*13*256 feature maps are obtained after convolution operation; and then through the maximum pooling layer MP5 with a pooling scale of 3*3 and a step size of 2, four groups of 6*6*256 features are obtained Figure; 4096 outputs are obtained through three layers of fully connected layers with 4096 neurons.

Step4:构建分类器，包含10个拥挤程度等级，即{宽敞(0-2级)，舒适(3-5级)，拥挤(5-8级)，危险(9-10级)}，使网络具备对检测单元中包含的乘客拥挤程度进行划分的能力。全连接层的4096个输出与分类器的10个神经元进行全连接。Step4: Build a classifier, including 10 levels of congestion, namely {spacious (level 0-2), comfortable (level 3-5), crowded (level 5-8), dangerous (level 9-10)}, making the network It has the ability to divide the degree of passenger congestion included in the detection unit. The 4096 outputs of the fully connected layer are fully connected with 10 neurons of the classifier.

Step5:使用预设的训练集对先前构建的卷积神经网络进行训练，并根据改良的梯度下降方法修正网络中的参数。改良的随机梯度下降法在传统方法的基础上，根据实际运营需求在损失函数中增加一项惩罚系数当检测的结果低于实际拥挤程度时σ＝1+log(yⁱ-g_θ(xⁱ))，否则σ＝1。完成训练后，将待检测单元输入卷积神经网络，使分类器对待检测视频做出合理的判断。Step5: Use the preset training set to train the previously constructed convolutional neural network, and correct the parameters in the network according to the improved gradient descent method. The improved stochastic gradient descent method is based on the traditional method, and a penalty coefficient is added to the loss function according to the actual operation requirements. When the detection result is lower than the actual degree of congestion, σ=1+log(y ⁱ -g _θ (x ⁱ )), otherwise σ=1. After the training is completed, the unit to be detected is input into the convolutional neural network, so that the classifier can make a reasonable judgment on the video to be detected.

本实施例涉及的卷积层激活函数选用ReLU函数。The convolutional layer activation function involved in this embodiment selects the ReLU function.

以上显示和描述了本发明的基本原理、主要特征和本发明的优点。本行业的技术人员应该了解，本发明不受上述实例的限制，上述实例和说明书中描述的只是说明本发明的原理，在不脱离本发明精神和范围的前提下本发明还会有各种变化和改进，这些变化和改进都落入要求保护的本发明范围内。本发明要求保护范围由所附的权利要求书及其等同物界定。The basic principles, main features and advantages of the present invention have been shown and described above. Those skilled in the industry should understand that the present invention is not limited by the above-mentioned examples. What are described in the above-mentioned examples and descriptions are only to illustrate the principles of the present invention. The present invention also has various changes without departing from the spirit and scope of the present invention. These changes and improvements all fall within the scope of the claimed invention. The protection scope of the present invention is defined by the appended claims and their equivalents.

Claims

1. the urban rail transit passenger congestion degree detection method based on convolutional neural network, is characterized in that, comprises the steps:

S1, acquiring traffic monitoring video to be detected, preprocessing the video to be detected, segmenting and extracting motion residual images;

S2, combine the original image and the motion residual image as the input of the convolutional neural network algorithm, establish a feature extraction block including at least one convolutional layer and a maximum pooling layer, process the original image and the motion residual image, and calculate the original image respectively and the crowd state features contained in the motion residual image;

S3. Combining crowd state features with motion features, constructing a feature fusion block including at least one convolutional layer, a maximum pooling layer, and a fully connected layer, and performing fusion processing on the feature map obtained in step S2:

S4, constructing a classifier including a degree of congestion level;

S5, use the prefabricated training set with the degree of congestion label to train the convolutional neural network, and use the stochastic gradient descent algorithm to correct the parameters in the convolutional neural network, so that the classifier can correctly determine the degree of passenger congestion in the video to be tested. Detection, the stochastic gradient descent method formula is as follows:

g(θ)=∑θx ⁱ

Among them, g(θ) represents the network hypothesis function; θ represents the parameter weight of the convolutional neural network; h(θ) represents the loss function; x ⁱ represents the network input value of the i-th sample; y ⁱ represents the value of the i-th sample Sample value; m represents the total number of algorithm iterations; σ represents the penalty coefficient; Represents the gradient; η represents the learning rate in gradient descent.

2. the urban rail transit passenger congestion detection method based on convolutional neural network as claimed in claim 1, is characterized in that described step S1 further comprises,

S11, acquiring traffic monitoring video to be detected;

S12, setting the detection cycle T, dividing the video to be detected into video segments with a length T according to the detection cycle T, and the first frame image of the video segment is a reference image;

S13. Taking other images in the video segment and making differences with the reference image respectively to obtain a motion residual image.

3. as claimed in claim 1 or 2, is characterized in that the quantity of each feature extraction block is 1 in the described step S2, and described convolution layer and and The connection mode of the pooling layer is alternate connection.

4. the urban rail transit passenger congestion degree detection method based on convolutional neural network as claimed in claim 3, it is characterized in that in the described step S3, the feature figure that feature fusion block outputs with feature extraction block is input, and described feature fusion The number of blocks is 1.

5. the urban rail transit passenger congestion degree detection method based on convolutional neural network as claimed in claim 4, is characterized in that in described step S3, fully connected layer quantity is 3, and is last three layers, described convolutional layer The connection method with the pooling layer is alternate connection, and they are all located in front of the fully connected layer.

6. the urban rail transit passenger crowding degree detection method based on convolutional neural network as claimed in claim 3, is characterized in that classifier grade is divided into ten layers in the described step S4, is respectively spacious: 0-2 class; Comfortable : Level 3-5; Crowded: Level 5-8; Dangerous: Level 9-10.

7. the urban rail transit passenger congestion degree detection method based on convolutional neural network according to claim 6, is characterized in that in the described step S5, when the result of detection is lower than actual degree of congestion, penalty coefficient σ=1+ log(y ⁱ −g _θ ( ^xi )), otherwise σ=1.