CN108830166B

CN108830166B - Real-time bus passenger flow volume statistical method

Info

Publication number: CN108830166B
Application number: CN201810505385.5A
Authority: CN
Inventors: 章国泰; 聂凯; 张立斌; 高珊华; 王红广; 王鹏; 张东旭
Original assignee: Tianjin Tongka Intelligent Network Technology Co ltd
Current assignee: Tianjin Tongka Intelligent Network Technology Co ltd
Priority date: 2018-05-24
Filing date: 2018-05-24
Publication date: 2021-06-29
Anticipated expiration: 2038-05-24
Also published as: CN108830166A

Abstract

The invention relates to a real-time bus passenger flow rate statistical method. The method aims to detect and track the human head in the video image. The video data of the method is passenger flow video data shot from the top areas of the front door and the rear door of the bus by using a camera, and the statistical method comprises the following steps: extracting characteristics; training a model; head detection; head tracking; compared with the prior art, the invention has the advantages that: the detection method has real-time performance and can efficiently and accurately detect the passenger flow.

Description

Real-time bus passenger flow volume statistical method

The technical field is as follows:

the invention relates to the technical field of image processing in pattern recognition, and further relates to a real-time bus passenger flow volume statistical method.

Background art:

many developers have used infrared devices and pressure sensors to make passenger flow statistics. Since the error between the detection result and the actual passenger flow is large, passenger flow counting methods based on infrared devices and pressure sensors have not been used until now and are gradually abandoned. In recent years, with the continuous development of the pattern recognition field and the GPU parallel computing field, the computer vision direction has been rapidly developed. In the field of image processing, passenger flow statistics is an important application, and is a new field and direction in intelligent video monitoring at present.

In recent years, passenger flow statistics methods are mainly based on three main categories: the three methods are all defect-based on detection methods of feature points, human body segmentation and tracking and deep learning. The accuracy of the first two detection methods needs to be improved. Although the third type of detection method has high accuracy, its real-time performance and high hardware cost cannot meet the standards for popularization and use.

The invention content is as follows:

the invention aims to provide a method for detecting human heads in video images. Such a method should be able to detect the passenger flow efficiently and accurately. The specific technical scheme is as follows:

the video data of the method is passenger flow video data shot from the top areas of the front door and the rear door of the bus by using a camera, and the statistical method comprises the following steps:

step 1: model training:

step 1.1: converting real image video frames collected by a camera into a BGRA format; in the video frame, calibrating the head position of the passenger and regarding the head position as a positive sample training image set of a training detection model, and regarding the area without the head of the passenger in the video frame as a negative sample training image set of the training detection model; carrying out image scale normalization processing on the positive sample training image set and the negative sample training image set to form training input images with the same size;

step 1.2: extracting features of the image: calculating a gradient amplitude characteristic and a related direction value of the LUV characteristic; according to the related gradient direction value, quantifying the gradient amplitude characteristic and extracting the histogram characteristic of the gradient amplitude characteristic; then, connecting the LUV characteristic, the gradient amplitude characteristic and the histogram characteristic in series to form a final image characteristic;

step 1.3: constructing a characteristic pyramid: calculating a characteristic scaling coefficient of each layer of pyramid according to the number of layers of the set characteristic pyramid; then, multiplying the image characteristics by the characteristic scaling coefficient to form a characteristic pyramid;

step 1.4: constructing a training model: sending the generated characteristic pyramid into a decision tree algorithm for training until the training accuracy reaches the requirement, and ending iterative training;

step 2: head detection:

step 2.1: acquiring a video frame to be detected by using a camera;

step 2.2: calculating LUV characteristics of a video frame to be detected, gradient amplitude characteristics of the LUV characteristics and related direction values; according to the related gradient direction value, quantifying the gradient amplitude characteristic and extracting the histogram characteristic of the gradient amplitude characteristic; then, connecting the LUV characteristic, the gradient amplitude characteristic and the histogram characteristic in series to form a final image characteristic;

step 2.3: calculating a characteristic scaling coefficient of each layer of pyramid according to the number of layers of the set characteristic pyramid; then, multiplying the image characteristics by the characteristic scaling coefficient to form a characteristic pyramid;

step 2.4: inputting the feature pyramid of the video frame to be detected obtained in the step 2.3 into the decision tree classifier trained in the step 1.4, and multiplying the feature scaling coefficient of each pyramid layer by the set size of the detection window and the detection moving step length to obtain the size of the detection window and the detection moving step length for scanning each pyramid layer; scanning on the corresponding pyramid layer by using the corresponding detection window and the moving step length; judging whether the windows are the head regions or the non-head regions of passengers according to the judgment results of the decision tree on each layer of pyramid and the window scores, scaling the windows which are determined to be the heads to the original size according to the characteristic scaling coefficient, and removing overlapped windows; finally, recording the head position of the passenger in a result file;

and step 3: tracking the head:

tracking the detection window by using a related kernel filter method, forming a track, and if the track crosses a specified limit, indicating that the passenger finishes the action of getting on or off the vehicle;

and 4, step 4: and (3) passenger flow volume counting:

if the passenger has finished getting on or off, the algorithm will update the data with the addition of 1 to the passenger flow, otherwise, the passenger flow will remain unchanged.

As one of the preferable schemes, the specific process of step 1.2 is as follows: mapping the RGB pixel values of the image into a three-dimensional stereo space X, Y, Z, using this space to calculate the corresponding LUV feature values; then, respectively carrying out convolution on L, U, V features in the horizontal direction and the vertical direction, solving the square values of all feature points in the two directions, adding the square values of all feature points in the two directions, taking a square root, selecting a maximum value as a gradient amplitude feature, and selecting an arctangent value of the convolution values in the horizontal direction and the vertical direction as a gradient direction value; finally, according to the related gradient direction value, quantifying the gradient amplitude characteristic and extracting the histogram characteristic of the gradient amplitude characteristic; the LUV feature, gradient magnitude feature, and histogram feature are then concatenated to form the final image feature.

In the above preferred embodiment, the calculation method for calculating the gradient magnitude characteristic and the correlation direction value of the LUV characteristic is as follows:

G_x(x，y)＝I(x+1，y)-I(x-1，y)

G_y(x，y)＝I(x，y+1)-I(x，y-1)

α(x，y)＝tan^-1(G_y(x，y)/G_x(x，y))

G_x(x, y) represents the horizontal gradient magnitude of the input image at the pixel point (x, y), G_y(x, y) represents the vertical gradient amplitude of the input image at the pixel point (x, y), G (x, y) is the gradient amplitude of the pixel point (x, y), alpha (x, y) is the gradient direction of the pixel point (x, y), and I (x, y) represents the L, U, V characteristic value of the point.

As a second preferred embodiment, the method for calculating the final image feature in step 1.3 is as follows:

F_i＝μ_i*F,F_icharacteristic of the i-th pyramid of the method, μ_iAnd F is the coefficient of the ith layer of pyramid characteristics of the method, and F is the formed final image characteristics.

As a third preferred scheme, in step 2, a parallel computing method is applied, on the GPU, feature extraction is performed on the current video frame by using the heterogeneous programming language OpenCL, and the passenger head detection of the previous frame is completed on the CPU, and the two are performed simultaneously, so as to achieve the purpose of parallel processing.

As a fourth preferred scheme, in step 3, a correlation filter is trained by the correlated kernel filter according to the information of the previous and subsequent frames, and correlation calculation is performed with the newly input frame, and the obtained confidence map is the predicted tracking result; the point or block with the highest score is the tracking result.

As the fifth preferred scheme, the method also comprises a step 5, and the process is as follows:

and 5: storing the coordinates and width and height of the passenger head detected by each frame of image into a file of a detection result, and storing each frame of image; when the bus stops running, the method can output the final passenger flow.

Compared with the prior art, the invention has the advantages that:

the detection method has real-time performance and can efficiently and accurately detect the passenger flow.

In the embodiment, the head detection thread is implemented on the CPU, and the feature extraction thread is implemented on the GPU, so that the head detection thread and the feature extraction thread can simultaneously operate, realize parallelization processing and save calculation time.

Description of the drawings:

fig. 1 is a flow chart of the bus passenger flow rate statistical method of the patent.

FIG. 2 is a schematic diagram of a process of extracting features by GPU parallel computing.

FIG. 3 is a schematic flow chart of creating a feature pyramid.

FIG. 4 is a schematic diagram of algorithmic two-thread processing; in the figure, a DP thread represents a head detection thread; the FC thread represents a feature extraction thread. Since the head detection thread is implemented on the CPU and the feature extraction thread is implemented on the GPU. Therefore, the two can be operated simultaneously, thereby realizing parallelization processing and saving calculation time.

The specific implementation mode is as follows:

example (b):

a real-time bus passenger flow volume statistical method is characterized in that video data of the method are passenger flow video data shot from top areas of a front door and a rear door of a bus by using cameras, and the statistical method comprises the following steps:

step 1: model training:

step 1.2: extracting features of the image:

step 1.2.1: LUV features were extracted on RGB images: mapping the RGB pixel values of the image into a three-dimensional stereo space X, Y, Z, using this space to calculate the corresponding LUV feature values;

step 1.2.2: carrying out convolution of triangular filtering on the LUV characteristics;

step 1.2.3: respectively calculating the gradient amplitude and the gradient direction of L, U, V, and selecting the maximum gradient amplitude of the three as a final amplitude result; calculating the amplitude direction according to the gradient amplitude, and outputting the amplitude direction if the amplitude direction is greater than or equal to zero; if the amplitude direction is smaller than zero, outputting the amplitude direction plus one hundred eighty degrees;

step 1.2.4: performing convolution of triangular filtering on the gradient amplitude characteristics;

step 1.2.5: quantizing the gradient amplitude characteristics and the gradient direction to generate a final gradient direction histogram;

step 1.2.6: scaling and connecting the LUV characteristic, the gradient amplitude characteristic and the gradient direction histogram characteristic in series;

in the above scheme, the gradient magnitude characteristic calculation method for calculating the LUV characteristic is as follows:

G_x(x，y)＝I(x+1，y)-I(x-1，y)

G_y(x，y)＝I(x，y+1)-I(x，y-1)

α(x，y)＝tan^-1(G_y(x，y)/G_x(x，y))

G_x(x, y) represents the horizontal gradient magnitude of the input image at the pixel point (x, y), G_y(x, y) represents the gradient amplitude of the input image in the vertical direction at the pixel point (x, y), G (x, y) is the gradient amplitude of the pixel point (x, y), alpha (x, y) is the gradient direction of the pixel point (x, y), and I (x, y) represents the characteristic value of L, U, V of the point;

step 1.3: constructing a characteristic pyramid: calculating a characteristic scaling coefficient of each layer of pyramid according to the number of layers of the set characteristic pyramid; then, multiplying the image characteristics by the characteristic scaling coefficient to form a characteristic pyramid; the final image feature calculation method is as follows:

F_i＝μ_i*F，F_icharacteristic of the i-th pyramid of the method, μ_iThe coefficient of the ith layer pyramid feature of the method is shown, and F is the formed final image feature;

step 2: head detection: by using a parallel computing method, on a GPU, feature extraction is carried out on a current video frame by using a heterogeneous programming language OpenCL, passenger head detection of the previous frame is completed on a CPU, and the two are carried out simultaneously, so that the purpose of parallel processing is achieved; the specific process is as follows:

step 2.1: acquiring a video frame to be detected by using a camera;

step 2.2: extracting LUV characteristics from the video frame:

step 2.2.1: LUV features were extracted on RGB images: mapping the RGB pixel values of the image into a three-dimensional stereo space X, Y, Z, using this space to calculate the corresponding LUV feature values;

step 2.2.2: carrying out convolution of triangular filtering on the LUV characteristics;

step 2.2.3: respectively calculating the gradient amplitude and the gradient direction of L, U, V, and selecting the maximum gradient amplitude of the three as a final amplitude result; calculating the amplitude direction according to the gradient amplitude, and outputting the amplitude direction if the amplitude direction is greater than or equal to zero; if the amplitude direction is smaller than zero, outputting the amplitude direction plus one hundred eighty degrees;

step 2.2.4: performing convolution of triangular filtering on the gradient amplitude characteristics;

step 2.2.5: quantizing the gradient amplitude characteristics and the gradient direction to generate a final gradient direction histogram;

step 2.2.6: scaling and connecting the LUV characteristic, the gradient amplitude characteristic and the gradient direction histogram characteristic in series;

G_x(x，y)＝I(x+1，y)-I(x-1，y)

G_y(X，y)＝I(x，y+1)-I(x，y-1)

α(x,y)＝tan^-1(G_y(x,y)/G_x(x,y))

step 2.3: calculating a characteristic scaling coefficient of each layer of pyramid according to the number of layers of the set characteristic pyramid; then, multiplying the image characteristics by the characteristic scaling coefficient to form a characteristic pyramid; the final image feature calculation method is as follows:

_i＝μ_i*F,F_icharacteristic of the i-th pyramid of the method, μ_iThe coefficient of the ith layer pyramid feature of the method is shown, and F is the formed final image feature;

and step 3: a tracking head stage:

tracking the detection window by using a related kernel filter method, forming a track, and if the track crosses a specified limit, indicating that the passenger finishes the action of getting on or off the vehicle; training a relevant filter by the relevant kernel filter according to the information of the previous and next frames, and carrying out relevance calculation on the relevant filter and the newly input frame to obtain a confidence image which is a predicted tracking result; the point or block with the highest score is the tracking result;

and 4, step 4: and (3) passenger flow counting:

if the passenger finishes the boarding action, the algorithm updates the data of the added 1 to the passenger flow; otherwise, the passenger flow volume can be kept unchanged;

Claims

1. A real-time bus passenger flow volume statistical method is characterized in that video data of the method are passenger flow video data shot from top areas of a front door and a rear door of a bus by using cameras, and the statistical method comprises the following steps:

step 1: model training:

step 1.2: extracting features of the image: calculating a gradient amplitude characteristic and a related direction value of the LUV characteristic; according to the related gradient direction value, quantifying the gradient amplitude characteristic and extracting the histogram characteristic of the gradient amplitude characteristic; then, connecting the LUV characteristic, the gradient amplitude characteristic and the histogram characteristic in series to form a final image characteristic; the specific process is as follows:

step 1.2.1: extracting LUV characteristics on the RGB image;

the step 2.2 comprises the following processes:

step 2.2.1: extracting LUV characteristics on the RGB image;

the calculation method for calculating the gradient amplitude characteristic and the correlation direction value of the LUV characteristic is as follows:

G_x(x,y)＝I(x+1,y)-I(x-1,y)

G_y(x,y)＝I(x,y+1)-I(x,y-1)

α(x,y)＝tan^-1(G_y(x,y)/G_x(x,y))

step 2: head detection:

step 2.1: acquiring a video frame to be detected by using a camera;

and step 3: tracking the head:

and 4, step 4: and (3) passenger flow volume counting:

2. The method for real-time statistics of the passenger flow volume of the bus according to claim 1, wherein the method for calculating the final image features in the step 1.3 is as follows:

F_i＝μ_iF,F_icharacteristic of the i-th pyramid of the method, μ_iAnd F is the coefficient of the ith layer of pyramid characteristics of the method, and F is the formed final image characteristics.

3. The method for real-time bus passenger flow statistics according to any one of claims 1-2, wherein in step 2, a parallel computing method is applied, and on the GPU, feature extraction is performed on a current video frame by using a heterogeneous programming language (OpenCL), and passenger head detection of a previous frame is completed on the CPU, and the two are performed simultaneously, so as to achieve the purpose of parallel processing.

4. The method for real-time statistics of bus passenger flow according to any one of claims 1-2, characterized in that in step 3, a correlation kernel filter is trained according to the information of the previous and next frames, and correlation calculation is performed with the newly inputted frame, and the obtained confidence map is the predicted tracking result; the point or block with the highest score is the tracking result.

5. The method for real-time bus passenger flow statistics according to any one of claims 1-2, characterized by further comprising a step 5, wherein the process is as follows: