CN108830246B

CN108830246B - Multi-dimensional motion feature visual extraction method for pedestrians in traffic environment

Info

Publication number: CN108830246B
Application number: CN201810661219.4A
Authority: CN
Inventors: 刘辉; 李燕飞; 韩宇阳
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2018-06-25
Filing date: 2018-06-25
Publication date: 2022-02-15
Anticipated expiration: 2038-06-25
Also published as: CN108830246A

Abstract

The invention discloses a traffic environment pedestrian multi-dimensional motion feature visual extraction method, which comprises the following steps: step 1: constructing a pedestrian motion database; step 2: extracting pedestrian detection frame images of the same pedestrian in the continuous image frames; and step 3: extracting HOG characteristics of the same pedestrian movement energy map; and 4, step 4: constructing a pedestrian motion posture recognition model based on an Elman neural network; and 5: judging the pedestrian posture in the current video by utilizing a pedestrian motion posture identification model based on an Elman neural network; step 6: calculating to obtain instantaneous speed sequences of the pedestrian in the X-axis direction and the Y-axis direction to obtain the real-time speed of the pedestrian; and 7: according to a three-dimensional scene under an intersection environment, position information of pedestrians in the image is obtained in real time, and real-time motion characteristics of the pedestrians are obtained by combining the postures and the real-time speeds of the pedestrians. The scheme has the characteristics of high identification accuracy and good robustness, is convenient to apply, and has a good application and popularization space.

Description

Multi-dimensional motion feature visual extraction method for pedestrians in traffic environment

Technical Field

The invention belongs to the field of traffic monitoring, and particularly relates to a visual extraction method for multi-dimensional motion characteristics of pedestrians in a traffic environment.

Background

In recent years, with the rapid development of scientific technology, more and more intelligent methods are applied to the traffic aspect, especially the field of intelligent driving. Traffic safety is a constant topic, and in collision accidents, collisions between vehicles and pedestrians account for a large proportion. The timely detection and posture identification of pedestrians are the key points in the existing intelligent traffic active protection system. To achieve accurate identification, it is most important to extract the motion characteristics of the pedestrian.

The pedestrian posture identification comprises a global feature method and a local feature method. The global feature mostly adopts a motion history image method, that is, frame difference information of a video sequence is accumulated into one frame image, and the frame difference contains certain motion information but does not contain shape information of a moving human body and is easily interfered by noise. Still another method is to extract the static edge information of the pedestrian in each frame of image, and the inter-frame images need artificial combination, which causes difficulty in identification. At present, the speed detection of pedestrians mostly adopts a radar method and cannot be well combined with visual images.

Chinese patent CN105957103A proposes a method for extracting motion features based on vision, which comprises the following steps: 1. extracting a motion vector of each pixel point based on the continuous frames; 2. extracting characteristic points with the pixel values changing strongly in the direction X, Y, T; 3. constructing a cubic characteristic vector of a direction-amplitude histogram based on the motion vector by taking the characteristic point as a center; 4. and forming a code vector for the local descriptor through a clustering algorithm. This patent has the following problems: 1. when the motion vector of each pixel point is extracted, the pixel points are not effectively screened, the data volume is large, and the calculation is complex; 2. the clustering algorithm applied by the patent is prone to local convergence.

In summary, it is urgently needed to provide a method for extracting pedestrian motion characteristics more accurately in a traffic environment.

Disclosure of Invention

The invention provides a visual extraction method for multi-dimensional motion characteristics of pedestrians in a traffic environment, and aims to accurately extract the postures of the pedestrians on a road, timely pre-warn vehicles on a traffic road and reduce traffic accidents.

A traffic environment pedestrian multi-dimensional motion feature visual extraction method comprises the following steps:

step 1: constructing a pedestrian motion database;

collecting various motion postures of pedestrians in various shooting directions of a depth camera and videos of positions of roads where the pedestrians are located, wherein the shooting directions comprise seven directions facing to the right front direction, the left front direction, the right front direction, the side surface, the right back direction, the left back direction and the right back direction of a lens, and the postures comprise three kinds of walking, running and standing;

step 2: extracting images of videos in a pedestrian motion database, preprocessing the extracted images to obtain a pedestrian detection frame of each frame of image, and extracting pedestrian detection frame images of the same pedestrian in continuous image frames;

and step 3: carrying out graying processing on each pedestrian detection frame image, synthesizing a motion energy map of a grayscale image corresponding to the pedestrian detection frame image of the same pedestrian in a continuous image frame, and extracting the HOG characteristic of the motion energy map;

and 4, step 4: constructing a pedestrian motion posture recognition model based on an Elman neural network;

taking a motion energy map corresponding to each pedestrian in the continuous image frames as input data, taking the posture of the corresponding pedestrian as output data, and training the Elman neural network;

the standing posture output corresponds to [001], the walking posture output corresponds to [010], and the running posture output corresponds to [100 ];

the Elman neural network parameter setting method comprises the steps that the number of nodes of an input layer corresponds to the number x of motion energy image pixels, the number of nodes of a hidden layer is 2x +1, the number of nodes of an output layer is 3, the maximum iteration number is 1500, the learning rate is 0.001, and the threshold value is 0.00001;

and 5: judging the pedestrian posture in the current video by utilizing a pedestrian motion posture identification model based on an Elman neural network;

extracting the pedestrian detection frame images of the same pedestrian in the continuous frame images from the current video according to the step 2, inputting the images into a pedestrian motion posture recognition model based on an Elman neural network to obtain corresponding postures, and distinguishing the postures;

step 6: calculating a pixel coordinate change sequence of the vertex of the lower left corner of the pedestrian detection frame of the same pedestrian in the continuous frame images, and calculating to obtain instantaneous speed sequences of the pedestrian in the X-axis direction and the Y-axis direction to obtain the real-time speed of the pedestrian;

and 7: according to a three-dimensional scene under an intersection environment, position information of pedestrians in the image is obtained in real time, and real-time motion characteristics of the pedestrians are obtained by combining the postures and the real-time speeds of the pedestrians.

The method comprises the steps that a depth camera is adopted by a camera of the intersection, a three-dimensional scene under the intersection environment is built, position information of pedestrians in an image is obtained in real time, the three-dimensional scene is divided into a pedestrian road and a vehicle road according to the actual road condition, when a person enters the three-dimensional scene, an ID is built for each person, and the motion characteristics of the person are judged through continuous frame image information.

Further, optimizing the weight and the threshold of the Elman neural network in the pedestrian motion posture recognition model based on the Elman neural network by using a chicken swarm algorithm, and specifically comprising the following steps of:

step A1: taking the individual positions of the chicken flocks as the weight and the threshold of the Elman neural network, and initializing chicken flock parameters;

the population scale M is [20,100], the search space dimension is j, the value of j is the sum of the weight of the Elman neural network to be optimized and the parameter number of the threshold, the maximum iteration time T is [400,1000], the iteration time is T, the initial value is 0, the proportion Pg of the cock is 20%, the proportion Pm of the hen is 70%, the proportion Px of the chick is 10%, and the hens are randomly selected from the hens, and the proportion Pd is 10%;

step A2: setting a fitness function, and enabling the iteration time t to be 1;

sequentially substituting the weight value and the threshold value corresponding to the individual positions of the chicken flock into the pedestrian motion gesture recognition model based on the Elman neural network, and determining the pedestrian detection frame image of the same pedestrian in the continuous frame images by utilizing the pedestrian motion gesture recognition model based on the Elman neural network determined by the individual positions of the chicken flockThe reciprocal of the difference between the pedestrian posture detection value of the same pedestrian in the pedestrian detection frame image in the continuous frame image and the corresponding actual pedestrian posture value is used as the first fitness function f₁(x)；

The greater the fitness, the more excellent the individual;

step A3: constructing a chicken flock subgroup;

sorting according to all individual fitness values, selecting chicken individuals with fitness values M × Pg in front of the fitness values to be judged as cocks, wherein each cock is used as the head of a subgroup; selecting chicken flock individuals with the fitness value of M x Px after ranking as chickens; judging other chicken individuals as hens;

dividing the chicken group into subgroups according to the number of the cocks, wherein one subgroup comprises one cock, a plurality of chickens and a plurality of hens, and each chicken randomly selects one hen in the subgroup to construct a hen-offspring relationship;

step A4: updating the individual positions of the chicken flock and calculating the fitness of each individual at present;

cock position updating formula:

wherein the content of the first and second substances,

indicates the position of the individual cock i in the j-dimensional space in the t-th iteration,

corresponding to the new position of the individual cock in the t +1 iteration, r (0, sigma)²) Subject to a mean of 0 and a standard deviation of σ²Normal distribution of (0, σ)²)；

Hen position update formula:

wherein the content of the first and second substances,

to locate the hen g in j-dimensional space in the t-th iteration,

is the only cock i in the subgroup of hens g in the t-th iteration₁The location of the individual is determined by the location of the individual,

is a random cock i outside the subgroup of the hen i in the t-th iteration₂Individual position, rand (0,1) is a random function, values are uniformly and randomly taken between (0,1), and L₁、L₂Updating the coefficients, L, for the positions of the hen i affected by the subgroup and other subgroups₁Value range of [0.25,0.55 ]]，L₂Value range of [0.15,0.35 ]]；

Chicken position update formula:

wherein the content of the first and second substances,

to locate chicken/in j-dimensional space in the t-th iteration,

for the hen g of the mother generation with the chick l corresponding to the mother-child relationship in the t-th iteration_mThe location of the individual is determined by the location of the individual,

omega, alpha and beta are the chicken self-update coefficients [0.2,0.7 ] respectively for the unique individual positions of the cocks in the subgroup of the chicks in the t iteration]Coefficient of following hen generation [0.5,0.8 ]]Coefficient of following cock [0.8, 1.5%]；

Step A5: and updating the individual optimal position and the chicken swarm whole individual optimal position according to the fitness function, judging whether the maximum iteration times is reached, if so, quitting, otherwise, making t equal to t +1, and turning to the step A3 until the maximum iteration times is met, outputting the weight and the threshold of the Elman neural network corresponding to the optimal chicken swarm individual position, and obtaining the pedestrian motion posture recognition model based on the Elman neural network.

Further, the real-time speed of the pedestrian is

Wherein the content of the first and second substances,

and

respectively representing the instantaneous speeds of the pedestrian in the X-axis direction and the Y-axis direction,

ΔW_j＝k|w₂-w₁|＝k|x₂×P-x₁×P|，ΔL_j＝|f(l₂)-f(l₁)|，l₁＝(N-y₁)×P，

the pixel coordinates of the pedestrian target point in the previous frame image and the current frame image are respectively (x)₁,y₁) And (x)₂,y₂)；l₁And l₂Respectively representing the distance between the pedestrian target point and the Y-axis edge of the display screen in two adjacent frames of images;

k represents the ratio of the actual scene distance to the scene imaging distance in the display screen, and M and N respectively represent the number of total pixel points in the X-axis direction and the Y-axis direction in the display screen; p represents the length of each pixel point in the display screen, and MP and NP are the total length of the X axis and the Y axis of the whole screen respectively; Δ W_jAnd Δ L_jRespectively representing the edges of the pedestrian target point in the two adjacent frames of imagesDisplacement in the X-axis and Y-axis directions;

AB represents the distance from the depth camera to the pedestrian, alpha represents the included angle between the connecting line between the depth camera and the pedestrian and the ground plane, theta is the included angle between the straight line between the depth camera and the pedestrian and the imaging plane, and m is the frame number.

And the values of AB, alpha and theta are obtained by real-time measurement by using a depth camera.

Further, according to the real-time motion characteristics of the pedestrians, carrying out pedestrian behavior level early warning on the vehicles on the traffic road;

the behavior levels comprise three levels of security, threat and danger;

the safety behaviors comprise that pedestrians are in a standing posture beyond one meter away from a traffic road, pedestrians are on a sidewalk and beyond one meter away from the traffic road and in a walking posture along the parallel direction of the traffic road or back to the traffic road, and back to the traffic road is in a running posture;

the threat behaviors comprise that pedestrians are within one meter of a pedestrian road on a sidewalk and a vehicle road, are positioned in the pedestrian road and are in a standing posture, and are within one meter of the pedestrian road and the vehicle road edge and are in a running posture;

the dangerous behaviors include a pedestrian on a sidewalk toward a traffic road direction or a pedestrian in a running posture in the traffic road, and in a walking posture in the traffic road;

when the walking speed of the pedestrian in the threatening behavior is more than 1.9m/s or the running speed is more than 8m/s, the threatening behavior is upgraded to dangerous behavior.

The behavior levels refer to safety conditions of states of pedestrians in the traffic environment, and different behavior levels prompt drivers of vehicles running in the traffic environment to ensure traffic safety;

further, the pedestrian target point is a lower left corner pixel point of the pedestrian detection frame image.

Further, preprocessing a pedestrian image frame, setting a pedestrian detection frame, a pedestrian target identifier and a pedestrian position label vector for the preprocessed image, and constructing a pedestrian track;

the pedestrian detection frame is a minimum circumscribed rectangle of a pedestrian outline in a pedestrian image frame;

the pedestrian target identification is a unique identification P of different pedestrians appearing in all the pedestrian image frames;

the expression form of the pedestrian position label vector is [ t, x, y, a, b ], t represents that the current pedestrian image frame belongs to the t-th frame in the monitoring video, x and y respectively represent the abscissa and the ordinate of the lower left corner of a pedestrian detection frame in the pedestrian image frame, and a and b respectively represent the length and the width of the pedestrian detection frame;

the appearance result of the pedestrian in the previous frame of pedestrian image in the next frame of pedestrian image means that if the pedestrian in the previous frame of pedestrian image appears in the next frame of pedestrian image, the tracking result of the pedestrian is 1, otherwise, the tracking result is 0; and if the pedestrian tracking result is 1, adding the corresponding pedestrian position label vector appearing in the pedestrian image of the next frame into the pedestrian track.

Advantageous effects

The invention provides a traffic environment pedestrian multi-dimensional motion feature visual extraction method, which comprises the following steps: step 1: constructing a pedestrian motion database; step 2: extracting images of videos in a pedestrian motion database, preprocessing the extracted images to obtain a pedestrian detection frame of each frame of image, and extracting pedestrian detection frame images of the same pedestrian in continuous image frames; and step 3: carrying out graying processing on each pedestrian detection frame image, synthesizing a motion energy map of a grayscale image corresponding to the pedestrian detection frame image of the same pedestrian in a continuous image frame, and extracting the HOG characteristic of the motion energy map; and 4, step 4: constructing a pedestrian motion posture recognition model based on an Elman neural network; and 5: judging the pedestrian posture in the current video by utilizing a pedestrian motion posture identification model based on an Elman neural network; step 6: calculating a pixel coordinate change sequence of the vertex of the lower left corner of the pedestrian detection frame of the same pedestrian in the continuous frame images, and calculating to obtain instantaneous speed sequences of the pedestrian in the X-axis direction and the Y-axis direction to obtain the real-time speed of the pedestrian; and 7: according to a three-dimensional scene under an intersection environment, position information of pedestrians in the image is obtained in real time, and real-time motion characteristics of the pedestrians are obtained by combining the postures and the real-time speeds of the pedestrians.

Compared with the prior art, the method has the following advantages:

1. the identification accuracy is high: the HOG characteristics of the synthesized motion energy map extracted by the invention not only comprise the pedestrian motion information of the whole image sequence, but also comprise the motion energy information of the pedestrian, and the characteristics are representative, so that the gesture identification of the pedestrian can be greatly facilitated;

2. the application is convenient: the pedestrian speed calculation method provided by the invention is directly operated based on the visual image, thereby realizing the perfect combination of speed detection and image recognition and facilitating the use of users;

3. the invention realizes the posture identification of the pedestrian in the image and the speed calculation of the pedestrian, has complete network structure and can greatly facilitate users;

4. the robustness is good: the invention uses the neural network, has strong nonlinear fitting capability and has better robustness when dealing with the problems of illumination change, pedestrian shielding and the like.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

fig. 2 is a schematic diagram of a distance relationship between a depth camera and a pedestrian.

Detailed Description

The invention will be further described with reference to the following figures and examples.

As shown in fig. 1, a method for visually extracting multidimensional motion features of pedestrians in traffic environment includes the following steps:

step 1: constructing a pedestrian motion database;

optimizing the weight and the threshold of the Elman neural network in the pedestrian motion posture recognition model based on the Elman neural network by using a chicken swarm algorithm, and specifically comprising the following steps:

step A2: setting a fitness function, and enabling the iteration time t to be 1;

sequentially substituting weights and thresholds corresponding to individual positions of chicken flocks into pedestrian motion posture recognition based on the Elman neural networkIn the model, the pedestrian gesture of the same input pedestrian in the pedestrian detection frame images in the continuous frame images is determined by utilizing an Elman neural network-based pedestrian motion gesture recognition model for chicken flock individual position determination, and the reciprocal of the difference between the pedestrian gesture detection value of the pedestrian detection frame image of the same pedestrian in the continuous frame images and the corresponding pedestrian gesture actual value is used as a first fitness function f₁(x)；

The greater the fitness, the more excellent the individual;

step A3: constructing a chicken flock subgroup;

cock position updating formula:

wherein the content of the first and second substances,

Hen position update formula:

wherein the content of the first and second substances,

to locate the hen g in j-dimensional space in the t-th iteration,

Chicken position update formula:

wherein the content of the first and second substances,

to locate chicken/in j-dimensional space in the t-th iteration,

the pedestrian real-time speed is

Wherein the content of the first and second substances,

and

k represents the ratio of the actual scene distance to the scene imaging distance in the display screen, and M and N respectively represent the number of total pixel points in the X-axis direction and the Y-axis direction in the display screen; p represents the length of each pixel point in the display screen, and MP and NP are the total length of the X axis and the Y axis of the whole screen respectively; Δ W_jAnd Δ L_jRespectively representing the displacement of the pedestrian target point in the directions of the X axis and the Y axis in two adjacent frames of images;

as shown in fig. 2, AB represents the distance from the depth camera to the pedestrian, α represents the included angle between the connecting line between the depth camera and the pedestrian and the ground plane, θ is the included angle between the straight line between the depth camera and the pedestrian and the imaging plane, values of AB, α, and θ are obtained by real-time measurement using the depth camera, and m is the frame number.

Carrying out pedestrian behavior level early warning on vehicles on a traffic road according to real-time motion characteristics of pedestrians;

the behavior levels comprise three levels of security, threat and danger;

in this example, the lower left corner pixel point of the pedestrian detection frame image is used as the pedestrian target point.

Preprocessing a pedestrian image frame, setting a pedestrian detection frame, a pedestrian target identifier and a pedestrian position tag vector for the preprocessed image, and constructing a pedestrian track;

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. A traffic environment pedestrian multi-dimensional motion feature visual extraction method is characterized by comprising the following steps:

step 1: constructing a pedestrian motion database;

2. The method according to claim 1, wherein a chicken flock algorithm is used for optimizing the weight and threshold of the Elman neural network in the pedestrian motion posture recognition model based on the Elman neural network, and the specific steps are as follows:

step A2: setting a fitness function, and enabling the iteration time t to be 1;

in turn willSubstituting the weight value and the threshold value corresponding to the chicken group individual position into the pedestrian motion posture recognition model based on the Elman neural network, determining the pedestrian posture of the input same pedestrian in the pedestrian detection frame image in the continuous frame image by using the pedestrian motion posture recognition model based on the Elman neural network determined by the chicken group individual position, and taking the reciprocal of the difference between the pedestrian posture detection value of the pedestrian detection frame image of the same pedestrian in the continuous frame image and the corresponding actual pedestrian posture value as a first fitness function f₁(x)；

Step A3: constructing a chicken flock subgroup;

cock position updating formula:

wherein the content of the first and second substances,

Hen position update formula:

wherein the content of the first and second substances,

to locate the hen g in j-dimensional space in the t-th iteration,

Chicken position update formula:

wherein the content of the first and second substances,

to locate chicken/in j-dimensional space in the t-th iteration,

3. The method of claim 1, wherein the pedestrian real-time speed is

Wherein the content of the first and second substances,

and

ΔW_j＝k|w₂-w₁|＝k|x₂×P-x₁×P|，ΔL_j＝|f(l₂)-f(l₁)|，l₁＝(N-y₁)×P，l₂＝(N-y₂)×P，

4. The method according to any one of claims 1 to 3, characterized in that pedestrian behavior level early warning is performed on vehicles on a traffic road according to real-time motion characteristics of pedestrians;

the behavior levels comprise three levels of security, threat and danger;

5. The method of claim 4, wherein the pedestrian target point is a lower left corner pixel point of a pedestrian detection frame image.

6. The method according to claim 5, characterized in that, the pedestrian image frame is preprocessed, and a pedestrian detection frame, a pedestrian target identification and a pedestrian position label vector are arranged on the preprocessed image to construct a pedestrian track;