CN112149588A

CN112149588A - Pedestrian attitude estimation-based intelligent elevator dispatching method

Info

Publication number: CN112149588A
Application number: CN202011039991.6A
Authority: CN
Inventors: 陈阳舟; 江业帆; 师泽宇
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2020-09-28
Filing date: 2020-09-28
Publication date: 2020-12-29

Abstract

The invention provides an elevator intelligent scheduling method based on pedestrian attitude estimation, belonging to the field of intelligent transportation and elevator application and comprising the following steps: (1) acquiring a monitoring video outside each elevator car in real time, storing and preprocessing the monitoring video in real time; (2) carrying out attitude estimation on the monitoring video acquired in real time to obtain a human body bone node diagram and the number of human bodies of each passenger; (3) taking 18 key point coordinates of a human skeleton key point diagram as input, sending the input into a behavior analysis model based on a support vector machine, and outputting two results of elevator taking intention and elevator not taking intention; (4) the human body posture estimation module and the elevator taking behavior analysis module finally acquire the information of the number of pedestrians with the elevator taking intention in the monitoring video, and send the information obtained by algorithm analysis to the elevator intelligent dispatching module. The invention provides technical support for intelligent elevator dispatching.

Description

Pedestrian attitude estimation-based intelligent elevator dispatching method

Technical Field

The invention belongs to the field of intelligent transportation and elevator application, and particularly relates to an elevator dispatching method based on pedestrian attitude estimation.

Background

Elevators are indispensable vertical transportation means, and although elevator products have been greatly changed in terms of hardware technology in recent years, it is difficult to make the entire elevator system provide high-quality service without a good elevator dispatching algorithm. Particularly, with the development of society, large buildings and high-rise and super high-rise buildings are more and more, so that a single elevator cannot meet the huge user requirements, and a plurality of elevators operate together and serve together, so that the elevator group control technology is developed at the discretion. An Elevator Group Control System (EGCS) is to connect a plurality of elevators with a computer, acquire passenger requests and Elevator running states acquired from the plurality of elevators by the computer, process data by corresponding dispatching algorithms, and then send Control commands to the elevators. The EGCS can send out reasonable elevator dispatching instructions in real time aiming at the change of passenger traffic flow in the building, and uniformly allocate the elevators of the whole elevator group so as to achieve the purpose of improving the operation efficiency and the service quality.

In order to solve the problem of identifying passenger traffic flow in an elevator group control system, researchers generally divide the elevator traffic flow into several different modes, such as an "up peak mode", a "down peak mode", and an "idle mode", by using an elevator traffic flow mode identification technology. The elevator traffic flow identification has the problem that the system is difficult to acquire detailed information of passenger calls in advance. The more detailed the passenger information pre-fetched by the control center is, the more beneficial the running efficiency of the elevator is. In the existing elevator system, the only operation that passengers can carry out before entering an elevator is to send out an up call instruction or a down call instruction, so that a dispatching center is difficult to acquire more passenger information in advance to optimize dispatching. And the group control system based on image recognition can acquire the passenger number information. At present, an elevator group control system based on image recognition mainly processes monitoring videos of an elevator waiting hall to obtain the approximate number of passengers, and methods adopted by the image recognition are generally based on target detection, but the method can only judge the number of passengers in a monitoring video and cannot accurately judge whether pedestrians are passengers and have elevator taking requirements. Considering whether the pedestrian in the video has the elevator taking requirement or not, the pedestrian can be judged according to the behavior and the posture, so the pedestrian can be processed by adopting a posture estimation technology, and then the posture of the pedestrian is analyzed so as to judge the elevator taking intention of the pedestrian.

Disclosure of Invention

The invention aims to provide an elevator intelligent dispatching method based on pedestrian attitude estimation, aiming at solving the problem of passenger traffic flow identification of an elevator group control system.

The technical scheme of the invention is implemented according to the following steps:

s1, acquiring the monitoring video outside each elevator car in real time, storing and preprocessing the monitoring video in real time;

s2, carrying out attitude estimation on the monitoring video acquired in real time to obtain a human body bone node map and the number of human bodies of each passenger;

s3, taking 18 key point coordinates of the human skeleton key point diagram as input, sending the input into a behavior analysis model based on a support vector machine, wherein the model is a two-classification model, and outputting results of elevator riding intention and elevator non-riding intention;

s4, a human body posture estimation module and an elevator taking behavior analysis module based on a support vector machine algorithm are included, the pedestrian number information with the elevator taking intention in the monitoring video is finally obtained, and the information obtained through algorithm analysis is sent to an elevator intelligent scheduling module;

further, in step S2, the human joint detection algorithm flow is as follows:

s2.1, detecting key points of bones: the skeleton key point thermodynamic diagram is composed of a series of two-dimensional points and is used for measuring the confidence coefficient of the key points appearing at a certain position of an image, and the position with the highest confidence coefficient is the final position of the key points. If there is only one person in the image, the particular keypoint has only one peak in the thermodynamic diagram, for the case of multiple persons, there are multiple peaks for the keypoint. The confidence of each location can be determined using a Gaussian function, with x, for the jth keypoint of the kth individual_j,kRepresenting the actual position of the key point, the confidence S of the pixel points around the key point p is:

where the standard deviation sigma is used to control the range of the distribution of confidence values. For the multi-person case, the specific key point for each person is the maximum value within the range:

s2.2, a group of detected body parts is given, a possibility measurement needs to be carried out on the connection of each body part, the body parts are guaranteed to belong to the same person, and the body parts are finally assembled into the complete body postures of a plurality of persons. Each section of limb is specified to correspond to an affinity field, each pixel point on the limb is represented by a unit vector, the position and direction information of the limb is kept, and the affinity field is formed by a series of unit vectors. x is the number of_j1,kAnd x_j2,kAre two key points on the k-th individual's forearm. If point p falls on the forearm, the affinity value at point p

Unit vectors for nodes j1 through j 2. If a point is outside the forearm, the affinity value at that point is a zero vector. The mathematical form is expressed as:

affinity field constraints:

σ_lis a distance threshold; l_c,k＝||x_j2,k-x_j1,k||₂The length of the whole small arm; v represents, v_⊥Is a vector perpendicular to the forearm. The overlapping affinity regions of the lower arm are represented as:

n_c(p) is the number of times the affinity fields of different people at position p are superimposed there, i.e. where the non-zero vectors of all people are superimposed. The confidence in the association between any two keypoints is the linear integration of the affinity field between the two points. For two keypoint locations d_j1And d_j2On a line segment consisting of two key points, the affinity field L is aligned_cSampling is performed, and the confidence of association between two keypoints is integrated on the line segment for the affinity field:

p (u) is at two nodes d_j1And d_j2The difference between:

p(u)＝(1-u)d_j1+ud_j2 (7)

s2.3, multi-person analysis: non-maximum suppression is used on the confidence map to obtain a set of candidate keypoints, and due to multiple persons or misjudgments, multiple candidate points exist for each part, and the candidate points define a possible limb set.

Defining a pair of key points j1, j2 for the c-th limb, finding the best match is a simplified bipartite graph problem. And obtaining the optimal matching by using the Hungarian algorithm.

The constraint conditions are as follows:

l_c,k＝||x_j2,k-x_j1,k||₂ (9)

E_cis the total weight of the limb c, Z_cIs a subset of Z, represents

And

whether or not to connect. E_mnIs that

And

affinity field confidence of, D_JIs a set of candidate keypoints.

The optimization is expanded to the optimization of key points of multiple persons, and two relaxation conditions are added according to the self structure of the human body: only the connection of adjacent key points is considered, and the human body is represented by a tree structure; optimization is not performed globally, but only on each torso type, and the decomposition is a bipartite graph matching problem. The optimization can be simplified as follows:

the method has the advantages that the np-hard problem is converted into a plurality of bipartite graphs which are easy to solve for optimization, the overall optimal solution can be effectively approached, meanwhile, the complexity of the algorithm is greatly reduced, and the purpose of real-time multi-person attitude estimation is achieved.

Further, in step S3, a Support Vector Machine (SVM) is a Machine learning method with excellent performance, and the main idea is: given a training sample, a hyperplane is established as a decision surface such that the isolation margin between positive and negative examples is maximized. If the samples are linearly inseparable, the SVM uses a so-called kernel function to map the samples into a high-dimensional feature space through nonlinear mapping, thereby efficiently performing nonlinear classification. The SVM is one of the most commonly used classifiers with the best effect at present, is often applied to the two-classification problem, has unique advantages in solving small sample, nonlinear and high-dimensional pattern recognition, and has better generalization capability. The skeletal features to be processed in the text are high-dimensional data, and the convergence rate of the algorithm required by the instantaneity of an application scene is high, so that the classifier is obtained by learning and training through the SVM algorithm, and the problem of two classifications of distinguishing the waiting behavior from the walking behavior is solved. The SVM model training process comprises the following steps:

s3.1, the experimental dataset is a short video dataset of collective activities, which contains 5 different collective activities, respectively, cross (Crossing), walk (Walking), wait (Walking), talk (Talking), queue (Queueing), and 44 short video sequences. The method only needs to distinguish two behaviors of waiting and walking of people, so three types of waiting, queuing and talking in a data set are selected as samples of waiting states, two types of walking and crossing are selected as samples of walking states, the two types are 100 samples respectively, 80% of the two types of samples are used as training sets, and 20% of the two types of samples are used as verification sets.

S3.2, joint point coordinate normalization: because the distance from the person to the camera is changed when the person passes through the camera, the closer the person is to the camera, the larger the joint point diagram of the person is, and the farther the person is from the camera, the smaller the joint point diagram of the person is. To solve this problem, we need to normalize the extracted bone features to scale the human body posture to the same height. Because the neck node of the human body node point map is stable, the neck node is selected as the origin, and the distance between the neck and the hip center is taken as the basis, so that the skeleton characteristic change caused by the change of the distance between a person and a camera is eliminated. The normalization formula is as follows:

wherein P is_i∈R²Is the coordinate of the skeletal joint point i, P_iIs' is P_iNormalized coordinates of (1), P_neckIs the neck node coordinate, P_hipAre the hip node coordinates. P_neckWhen the origin is set, the formula is simplified as follows:

s3.3, the waiting people and the walking people can be distinguished through the SVM classifier, and the passengers who are expected to take the elevator are excluded through the body orientation of the pedestrians in the last step. The pedestrian facing the elevator is judged to have the intention of taking the elevator, and the pedestrian facing away from the elevator is judged to have no intention of taking the elevator.

Further, after the information obtained by the algorithm analysis in the step S4 is sent to the elevator intelligent dispatching module, the elevator intelligent dispatching module dispatches the elevator in real time according to a certain dispatching rule.

Has the advantages that:

the invention provides an elevator intelligent scheduling method based on pedestrian attitude estimation. Firstly, carrying out posture estimation on a human body in a video in real time by using a human body posture estimation calculation method to obtain a human body skeleton key point diagram of each pedestrian, then using a skeleton node diagram as the input of an SVM classifier to identify and classify two behaviors of waiting and walking of the pedestrian, and eliminating the people without the need of taking a ladder by judging the body orientation of the passenger. And finally, the elevator taking demand information of the passengers is sent to the elevator intelligent dispatching module, and technical support is provided for elevator intelligent dispatching.

Drawings

FIG. 1 is a system framework diagram of a method for elevator ride intent prediction and elevator dispatch based on human pose estimation and behavior analysis in accordance with an embodiment of the present invention;

FIG. 2 is a diagram illustrating the effect of the posture estimation according to the embodiment of the present invention, wherein a is one of the pedestrians;

FIG. 3 is a diagram of a support vector machine based behavior analysis framework in accordance with an embodiment of the present invention;

FIG. 4 is a schematic diagram of a rectangular coordinate system of a top view according to an embodiment of the present invention;

Detailed Description

In order to make the present invention better understood by those skilled in the art, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a flowchart illustrating steps of an intelligent elevator dispatching method based on pedestrian attitude estimation according to an embodiment of the present invention is shown, which includes the following steps:

the above steps are to obtain the pedestrian before the elevator shot in each floor by the most suitable angle through the elevator taking person monitoring video image data acquisition unit including but not limited to the monitoring camera, the industrial camera and the like placed in the monitoring area of the appointed elevator, which is installed in the passageway waiting for each floor of the staircase.

S2, carrying out attitude estimation on the monitoring video acquired in real time, and carrying out pretreatment in real time to obtain a human body skeleton node diagram and the number of human bodies of each passenger, as shown in figure 2;

the human body posture estimation module and the elevator-taking behavior analysis module based on the support vector machine algorithm are mainly operated on a GPU server or other high-computation-capacity embedded devices, including but not limited to latest GPU embedded devices TX1 and TX2 and the like introduced by Invitta.

the steps are processed by an elevator intelligent scheduling unit, wherein the elevator intelligent scheduling unit comprises but is not limited to an elevator monitoring platform of a residential area or a shopping mall property and a sensor arranged in an elevator. The control center for elevator dispatching includes but is not limited to processing devices such as CPUs, FPGAs and the like. The elevator operation monitoring module mainly comprises an elevator monitoring operation unit, an elevator operation and maintenance platform and other equipment based on a CPU, a DSP, an FPGA and other processors.

In a specific application example, the human body posture estimation module in S2 and S3 and the elevator riding behavior analysis module based on the support vector machine algorithm are core modules, and the working processes of the two modules are further detailed below.

In step S2, the body posture estimation module provides 18 or 25 joint points detection, wherein the main difference between the 18 or 25 joint points is that the joint points of the left and right feet are more detected than the joint points of the 25 joint points. Since the information about the joints of the foot is of little significance for passenger identification, and the detection of the joints of the foot has a large influence on the running speed of the program, the information about 18 human joints is selected as the input of the passenger identification algorithm. The 18 human body joint points are respectively: 0: nose, 1: neck, 2: right shoulder, 3: right elbow, 4: right wrist, 5: left shoulder, 6: left elbow, 7: left wrist, 8: right hip, 9: right knee, 10: right ankle, 11: left hip, 12: left knee, 13: the left ankle. Fig. 2 is an effect diagram of posture estimation of a pedestrian in a video.

S3.2, joint point coordinate normalization: because the distance from the person to the camera is changed when the person passes through the camera, the closer the person is to the camera, the larger the joint point diagram of the person is, and the farther the person is from the camera, the smaller the joint point diagram of the person is. To solve this problem, the extracted bone features need to be normalized to scale the human body posture to the same height. Since the human body closes the node map 0: the neck node is stable, the neck node is selected as an origin point, and the distance between the neck and the hip center is taken as a basis, so that the change of the bone characteristics caused by the change of the distance between a person and a camera is eliminated. The normalization formula is as follows:

s3.3, judging the direction of the human body: by means of the SVM classifier, waiting people and walking people can be distinguished, and the last step is to eliminate passengers who intend to take the elevator through the body orientation of pedestrians. Under the motion constraint of human body structure, the orientation of human head and the orientation of human body can not exceed 180 deg.,and according to statistics, the head and body directions are consistent in most cases, so that the directions of the bodies of the pedestrians can be determined according to the head directions of the pedestrians, and the head directions can be determined according to the following conditions that the head directions are 0: neck and 1: vector formed by two key points of nose

And (4) determining. Knowing the position of the elevator in the video, the pedestrian neck node can be used as the origin, and the direction entering the elevator (which can be calibrated by the actual shooting angle) is used as the y-axis to establish a coordinate system under the overlooking view angle, as shown in fig. 4.

Projected on a coordinate system as

The angle with the x-axis is denoted as θ. Then stipulate

When the pedestrian is moving to the elevator,

when the pedestrian walks backwards to the elevator. Referring to fig. 2, a pedestrian a is selected, and the head node coordinate is detected to be (292,243), and the neck node coordinate is detected to be (301,202). The direction of the elevator entrance is horizontal to the left, and the angle formed by the connecting line of the head node coordinate and the neck node coordinate is less than 90 degrees, so that the person is judged to have the elevator riding intention.

Claims

1. An elevator intelligent scheduling method based on pedestrian attitude estimation is characterized in that:

s1, acquiring the monitoring video outside each elevator car in real time, and storing and preprocessing the monitoring video in real time;

s2, carrying out posture estimation on the pedestrians in the monitoring video acquired in real time to obtain a human body skeleton node graph and the number of human bodies of each pedestrian;

s2.1-determine confidence for each location using Gaussian function, with x for the jth keypoint of the kth individual_j，kRepresenting the actual position of the key point, the confidence of the pixel points around the key point is as follows:

the standard deviation sigma controls the distribution range of the confidence value; for the multi-person case, the specific key point for each person is the maximum value within the range:

s2.2, a group of detected body parts are given, and the connection of each body part needs to be measured, so that the body parts belong to the same person, and the body parts are finally assembled into the complete body postures of a plurality of persons; each section of limb is specified to correspond to an affinity field, each pixel point on the limb is represented by a unit vector, the position and direction information of the limb is kept, and the affinity field is formed by a series of unit vectors; let x_j1，kAnd x_j2，kAre two key points on the kth individual's forearm; if point p falls on the forearm, the affinity value at point p

Unit vectors for nodes j1 through j 2; if the point is outside the forearm, the affinity value at the point is a zero vector; the mathematical form is expressed as:

affinity field constraints:

σ_lis a distance threshold; l_c，k＝||x_j2，k-x_j1，k||₂The length of the whole small arm; v is a vector extending in the direction of the forearm, v_⊥Is a vector perpendicular to the forearm; the overlapping affinity regions of the lower arm are represented as:

n_c(p) is the number of times the affinity fields of different people at position p are superimposed there, i.e. where the non-zero vectors of all people are superimposed;

the confidence of the association between any two key points is the linear integration of the affinity field between the two points; for two keypoint locations d_j1And d_j2On a line segment consisting of two key points, the affinity field L is aligned_cSampling is performed, and the confidence of association between two keypoints is integrated on the line segment for the affinity field:

p (u) is at two nodes d_j1And d_j2The difference between:

p(u)＝(1-u)d_j1+ud_j2 (7)

s2.3 multi-person analysis: adopting non-maximum suppression on the confidence map to obtain a group of candidate key points, wherein multiple candidate points exist in each part due to multiple persons or misjudgment, and the candidate points define a possible limb set;

defining a pair of key points j1 and j2 of the c-th limb, and obtaining the optimal matching by using the Hungarian algorithm:

the constraint conditions are as follows:

l_c，k＝||x_j2，k-x_j1，k||₂ (9)

E_cis the total weight of the limb c, Z_cIs a subset of Z, represents

And

whether to connect; e_mnIs that

And

affinity field confidence of, D_JIs a set of candidate keypoints;

the optimization is expanded to the optimization of key points of multiple persons, and two relaxation conditions are added according to the self structure of the human body: only the connection of adjacent key points is considered, and the human body is represented by a tree structure; global optimization is not carried out, optimization is only carried out on each trunk type, and the problem of bipartite graph matching is resolved; the optimization can be simplified as follows:

therefore, the global optimal solution can be effectively approached, and meanwhile, the complexity of the algorithm is greatly reduced, so that the aim of real-time multi-person attitude estimation is fulfilled;

s3, taking 18 key point coordinates of the human skeleton key point diagram as input, sending the input into a behavior analysis model based on a support vector machine, and outputting results of elevator riding intention and elevator non-riding intention;

s3.1 the experimental dataset is a short video dataset of collective activities, which comprises 5 different collective activities, namely Crossing (Crossing), Walking (Walking), Waiting (Walking), Talking (Talking), queuing (queuing) and 44 short video sequences; therefore, three types of waiting, queuing and talking in a data set are selected as samples of waiting states, two types of walking and crossing are selected as samples of walking states, the two types of samples are respectively 100, 80% of the two types of samples are used as training sets, and 20% of the two types of samples are used as verification sets;

s3.2, joint point coordinate normalization: normalizing the extracted bone features to zoom the human body posture to the same height; because the neck node of the human body is stable, the neck node is selected as the origin, and the distance between the neck and the hip center is taken as the basis, the normalization formula is as follows:

wherein P is_i∈R²Is the coordinate of the skeletal joint point i, P_iIs' is P_iNormalized coordinates of (1), P_neckIs the neck node coordinate, P_hipIs the hip node coordinates; p_neckWhen the origin is set, the formula is simplified as follows:

s3.3, waiting people and walking people can be distinguished through an SVM classifier, and the passengers who are expected to take the elevator are eliminated through the body orientation of the pedestrians in the last step; the pedestrian facing the elevator is judged to have the intention of taking the elevator, and the pedestrian facing away from the elevator is judged to have no intention of taking the elevator;

s4, the human body posture estimation module and the behavior analysis module based on the support vector machine finally acquire the pedestrian number information with the elevator riding intention in the monitoring video, and send the information obtained by algorithm analysis to the elevator intelligent scheduling module.

2. An elevator intelligent scheduling method based on pedestrian attitude estimation is characterized in that:

and after the information obtained by the algorithm analysis in the step S4 is sent to the elevator intelligent dispatching module, the elevator intelligent dispatching module dispatches the elevator in real time according to a certain dispatching rule.

3. An elevator intelligent scheduling method based on pedestrian attitude estimation is characterized in that: the steps are processed by an elevator intelligent scheduling unit, wherein the elevator intelligent scheduling unit comprises but is not limited to an elevator monitoring platform of a residential area or a shopping mall property and a sensor arranged in an elevator; the control center for elevator dispatching comprises but is not limited to processing equipment such as a CPU (central processing unit), an FPGA (field programmable gate array) and the like; the elevator operation monitoring module mainly comprises an elevator monitoring operation unit, an elevator operation and maintenance platform and other equipment based on a CPU, a DSP, an FPGA and other processors.