CN109522793B

CN109522793B - Method for detecting and identifying abnormal behaviors of multiple persons based on machine vision

Info

Publication number: CN109522793B
Application number: CN201811178434.5A
Authority: CN
Inventors: 田联房; 吴啟超; 杜启亮
Original assignee: South China University of Technology SCUT; Zhuhai Institute of Modern Industrial Innovation of South China University of Technology
Current assignee: South China University of Technology SCUT; Zhuhai Institute of Modern Industrial Innovation of South China University of Technology
Priority date: 2018-10-10
Filing date: 2018-10-10
Publication date: 2021-07-23
Anticipated expiration: 2038-10-10
Also published as: CN109522793A

Abstract

The invention discloses a method for detecting and identifying abnormal behaviors of multiple persons based on machine vision, which comprises the following steps: 1) acquiring a video image of an escalator region; 2) extracting HOG (histogram of gradient directions) features and detecting the faces of escalator passengers by using an Adaboost classifier; 3) tracking the faces of the escalator passengers by using a Kalman filter; 4) extracting passenger skeleton characteristics from the image by using an OpenPose deep learning network; 5) the passenger is shielded and judged based on the relative position of the face of the passenger, when the passenger is shielded, abnormal behavior detection is carried out on the passenger based on the motion characteristics, when the passenger is not shielded, the abnormal behavior detection is carried out on the passenger based on the skeleton characteristics, and the type of the abnormal behavior is further identified by utilizing a space-time graph convolution model. The invention can track the multi-passenger targets on the escalator, and can detect and identify abnormal behaviors of multiple passengers on the escalator end to end, accurately and in real time.

Description

Method for detecting and identifying abnormal behaviors of multiple persons based on machine vision

Technical Field

The invention relates to the technical field of image processing and behavior recognition, in particular to a method for detecting and recognizing abnormal behaviors of multiple persons based on machine vision.

Background

Compared with manual monitoring, the intelligent video monitoring system has the advantages of being stable, reliable, low in price and practical, manpower cost is needed for manual monitoring, the monitoring effect is unstable, the system is easily influenced by the state of people, the intelligent video monitoring system can automatically monitor by using a machine, the manpower cost is saved, the monitoring effect is stable, therefore, the intelligent video monitoring system is applied to detect abnormal behaviors of multiple passengers in the escalator, if the abnormal behaviors are found, the types of the abnormal behaviors are identified, the running state of the escalator is controlled according to the danger level of the abnormal behaviors, safety accidents are timely prevented, the system has important significance, and numerous students are attracted to conduct relevant research on the abnormal behaviors.

So far, the detection and identification method for abnormal behaviors of multiple people aiming at the application scene of the escalator in China is less researched, therefore, the invention discloses a method for detecting and identifying abnormal behaviors of multiple persons based on machine vision, which comprises the steps of accurately detecting the faces of passengers in a hand elevator area in real time from an input image by using an Adaboost face classifier obtained by training through a camera arranged right above a floor plate of the hand elevator, tracking the faces of the passengers through a Kalman filter to obtain the motion characteristics of the passengers, extracting the skeleton characteristics of the passengers from the image by using an OpenPose deep learning network, and finally, judging the blocking of the passengers, if the passengers are blocked, detecting abnormal behaviors of the passengers based on the motion characteristics, if the passengers are not blocked, and detecting abnormal behaviors of the passengers based on the skeleton characteristics, and further identifying the types of the abnormal behaviors by utilizing a space-time graph convolution model.

By combining the above, the detection and identification of the abnormal behaviors of multiple people of the escalator are realized by utilizing machine learning and deep learning knowledge, the occurrence of safety accidents is avoided in time, and the method has higher social value and practical significance.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a method for detecting and identifying abnormal behaviors of multiple persons based on machine vision, which is used for tracking a target of multiple passengers on an escalator and detecting and identifying the abnormal behaviors of the multiple passengers on the escalator end to end, accurately and in real time.

In order to achieve the purpose, the technical scheme provided by the invention is as follows: the method for detecting and identifying the abnormal behavior of multiple persons based on machine vision comprises the following steps:

1) acquiring a video image of an escalator region;

2) extracting features of a Histogram of Oriented Gradient (HOG) and detecting faces of escalator passengers by using an Adaboost classifier;

3) tracking the faces of the escalator passengers by using a Kalman filter;

4) extracting passenger skeleton characteristics from the image by using an OpenPose deep learning network;

5) carrying out shielding judgment on the passenger based on the relative position of the face of the passenger; when the passengers are shielded, detecting abnormal behaviors of the passengers based on the motion characteristics; when the passengers are not shielded, abnormal behavior detection is carried out on the passengers on the basis of the skeleton characteristics, and the types of the abnormal behavior are further identified by utilizing a space-time graph convolution model.

In the step 1), a wide-angle camera with 1280 × 720 image resolution is adopted to shoot an escalator area at an angle from the upper side to the lower side, a monitoring video image is collected, in order to shoot the face of a passenger, the optical axis of the camera is parallel to the handrail belt of the escalator, the shooting angle covers the whole escalator running area, and a clear image of the face of the passenger with a slight overlooking angle is obtained.

In step 2), the HOG features are extracted and the faces of the escalator passengers are detected by using an Adaboost classifier, and the method comprises the following steps:

2.1) standardized color space

The square root Gamma compression for each color channel, with similar results for RGB and LAB color spaces, but with performance degradation if gray space is used, the Gamma compression formula is:

H(x,y)＝H(x,y)^Gamma

wherein Gamma is a compression factor, the Gamma value is 0.5, and H (x, y) is the pixel value of the pixel point (x, y);

2.2) calculating image gradients

Using simple [ -1,0,1]]The template and its transposition are used to make convolution operation and calculation on the imageImage gradient, wherein Gaussian smoothing is not performed before gradient calculation, performance is reduced by increasing Gaussian smoothing, the gradient of each color channel is calculated for a multi-channel color image, the gradient vector of the point is the maximum norm, and the gradient of a pixel point H (x, y) in the escalator image

Is a vector:

wherein G is_x(x, y) is the gradient of the pixel point (x, y) in the horizontal direction, G_y(x, y) is the gradient of the pixel point (x, y) in the vertical direction, which is:

gradient of gradient

The amplitude and direction of (a) are respectively:

wherein G (x, y) is a gradient

The amplitude of (a) of (b) is,

is a gradient

The direction of (a);

2.3) construction of gradient orientation histograms for each cell Unit

Combining 8-8 pixel points to obtain a cell unit, calculating the gradient of each pixel in the cell unit, voting for a certain bin based on a direction so as to form a gradient direction histogram, wherein the direction bin of the histogram is 0-180 degrees, in order to reduce aliasing phenomenon, the gradient voting needs to carry out trilinear interpolation on the direction and the position, the voting weight is calculated according to the gradient amplitude, fine direction coding is vital to the obtained result, but spatial sampling can be quite rough;

2.4) Intra-Block normalized gradient Direction histogram

Combining cell units into a large block, normalizing the histogram of gradient direction in the block, enabling the variation range of gradient intensity to be very large due to the variation of local illumination and the variation of foreground and background contrast, so that local normalization needs to be carried out on the gradient, combining the cell units into a larger block, then normalizing each block, wherein a final descriptor is a vector formed by the histograms of the cell units in the block, the blocks are overlapped, a block descriptor after normalization is called as an HOG descriptor, and a block normalization strategy adopts L2 to cut off;

2.5) Collection of HOG features

Collecting HOG features of all overlapped blocks in a detection window, and combining the HOG features into a final feature vector for classification, wherein the detection window needs to contain context information of an image, and a detection window with the size of 64 x 128 generates a blank edge of about 16 pixels around a human body, and the blank edge is added with the context information which is helpful for detection;

2.6) detecting the faces of the escalator passengers by utilizing an Adaboost classifier

The basic idea of the AdaBoost algorithm is to train a plurality of weak classifiers by multiple rounds on the same data set and combine the weak classifiers into a strong classifier, the classification accuracy of the weak classifiers is slightly larger than the randomly guessed classification accuracy, the weak classifiers are classifiers capable of processing weighted data, the accuracy and the recall ratio of the weak classifiers are poorer than those of the strong classifiers, a simple classifier is used as the weak classifier to obtain a better effect, a single-layer decision tree is used as the weak classifier to process various types of data, and therefore the single-layer decision tree is used as the weak classifier to process various types of dataClass-device, single-level decision tree h_j(x) Comprises the following steps:

wherein x is a feature vector of a certain sample, x_jIs the value of the jth feature in the feature vector, θ_jIs the decision threshold of the jth feature, p_jThe value of (2) is 1 or-1, and is used for determining whether the judgment standard is greater than or less than the threshold, so that one weak classifier depends on three parameters of symbol direction, feature selection of dimension and judgment threshold, the weak classifier with the minimum error rate is taken as the optimal weak classifier obtained in the round, and the basic idea of AdaBoost strong classifier training is to give a weight to each training sample in the data set and initialize the training samples to be equal values. Firstly training a weak classifier on the data set, then adjusting the weight of each sample, reducing the weight of the sample which is classified correctly at the last time, increasing the weight of the sample which is classified wrongly, training the weak classifier again on the data set, and finishing the training when the number of the weak classifiers reaches the specified number or the error rate is lower than a certain threshold value.

In the step 3), a Kalman filter is utilized to track the faces of the escalator passengers, and the method specifically comprises the following steps:

the Kalman filter iterates by estimating the state variables of the motion system, and finally converges to an optimal autoregressive solution, namely the optimal estimation of the state variables, can predict the target position at the next moment, is a highly efficient linear recursive filter, solves the state space in a time domain, can predict the optimal state of the system from a section of incomplete and noisy interference signals, and can predict the optimal value of the signal by using the current measured value of the signal and the estimated value of the prior state, and for a discrete-time control system, the state equation of the discrete-time control system is as follows:

x_k＝A_k,k-1x_k-1+Bu_k+w_k-1

wherein x is_kIndicating the state of the control system at time k, A_k,k-1Is a state transition matrix representing the change in system state from time k-1 to time k, u_kThen represents the input variable from the outside to the system at time k, B is the transformation matrix for controlling the gain of the outside input, w_k-1Representing process noise in practical applications;

the system's observation equation is:

z_k＝H_kx_k+v_k

wherein z is_kAn observed value, H, characterizing the state of the control system at time k_kIs a measurement matrix representing the state value x of the system at time k_kAnd the observed value z_kRelation between v_kRepresenting measurement noise in practical applications;

assuming that the process noise and the measurement noise do not change with the change of the system and are white noise with a mean value of 0, let the covariance matrix of the process noise be Q_kThe covariance matrix of the measured noise is R_kAnd calculating the system state at the next moment by using a state equation, wherein the state equation is as follows:

wherein the content of the first and second substances,

is the result of the optimization at the last moment,

the system state value is obtained by predicting according to the result of the last moment, and the current state of the system needs to be updated after being updated

The corresponding covariance, the update equation of the covariance is:

P_k,k-1＝A_k,k-1P_k-1A^T _k,k-1+Q_k-1

wherein, P_k-1Is composed of

Covariance of (P)_k,k-1Is that

The covariance of the system state is predicted, then the Kalman filtering gain is calculated, the predicted value is corrected by combining the observed value of the system K moment, and the optimized predicted value can be obtained, wherein the Kalman filtering gain K is_kComprises the following steps:

optimized estimated value of k time state can be obtained through state correction

Updating

Corresponding covariance P_kContinuously iterating the Kalman filter, wherein the covariance update equation is as follows:

P_k＝P_k,k-1-K_kH_kP_k,k-1

in summary, the nature of the kalman filter is a process that continuously predicts and updates corrections.

In step 4), extracting passenger skeleton features from the image by using an openpos deep learning network, specifically as follows:

compared with the information such as optical flow, appearance, depth and the like, the skeleton can better describe behavior information of passengers, the OpenPose deep learning network can be used for accurately, real-timely and stably extracting the two-dimensional human skeleton of the passengers under the conditions of uneven illumination and shadow, the existing posture estimation method can be divided into a top-down method and a bottom-up method, the top-down method needs to detect each person from an image, then respectively estimate the posture of each person and extract the two-dimensional skeleton of the person, the method is influenced by the performance of a human body detector, the time consumption of an algorithm is increased along with the increase of the number of people in the image, the method is a bottom-up method relative to the top-down method, the posture estimation is carried out from bottom to top without detecting the human body, the time consumption of the algorithm is not influenced by the number of the people, but the associated information between the whole pedestrian and the joint points of the skeleton to which the pedestrian is located is ignored, the method needs to adopt other methods to associate skeleton joint points belonging to pedestrians with the whole pedestrian, aiming at the problem of skeleton extraction from bottom to top, the OpenPose deep learning network provides Partial Affinity Fields (PAFs) to carry out nonparametric explicit representation on the connection of the human body Part joint points, the PAFs is a set of two-dimensional vectors, each section of the human body skeleton corresponds to a PAFs graph, the size of the PAFs graph is the same as that of the original graph, each point in the graph is a two-dimensional vector which respectively represents components in the horizontal direction and the vertical direction, the position and the direction of one section of the skeleton are coded, the PAFs can be used for connecting the body Part joint points belonging to the whole pedestrian to extract the two-dimensional skeleton of passengers, the OpenPose network is divided into a plurality of stages, the output of each stage is compared with a true value to obtain a corresponding loss function, the loss functions of all the stages are accumulated to obtain a total loss function, the model convergence is facilitated, the total loss function is optimized, and the final model is obtained through iterative training;

during actual test, the input image can output a series of human body joint point confidence graphs and skeleton PAFs graphs through a model, wherein the number of the human body joint point confidence graphs is consistent with that of skeleton joint points, the number of the skeleton PAFs graphs is consistent with that of skeleton segments, a human body two-dimensional skeleton comprises 14 human body joint points including a nose, a neck, a left shoulder, a left elbow, a left wrist, a left hip, a left knee, a left ankle, a right shoulder, a right elbow, a right wrist, a right hip, a right knee and a right ankle and 13 sections of human bones formed by connecting the human body joint points, then the optimal connection problem of every two joint points of the human body is converted into a maximum weight value bipartite graph matching problem, the skeleton joint points are used as nodes in bipartite graphs, the PAFs are used as weight values of sides in the bipartite graphs, and a Hungary matching and greedy analysis algorithm is utilized to connect the skeleton joint points and the bones to obtain a complete human body skeleton.

In step 5), the passengers are judged to be shielded based on the relative positions of the faces of the passengers, which is as follows:

when passengers take the escalator, because the positions of the passengers are too close to each other, the passengers can be crowded, the shooting angle of the camera shoots the escalator region from near to far, the passengers close to the camera can shelter the passengers far away from the camera, the sheltered passengers lack partial skeletons, the behaviors of the passengers cannot be well described by the lost skeletons, and abnormal behavior misdetection is easily caused, therefore, the passengers need to be sheltered and judged, if the passengers are sheltered, abnormal behavior detection is carried out based on the motion characteristics of the passengers, if the passengers are not sheltered, the abnormal behavior detection is carried out based on the skeleton characteristics of the passengers, when the passengers are sheltered and judged, the distances dist between the passengers and the surrounding passengers are respectively calculated, and if the distances are smaller than a self-adaptive sheltering threshold T_dist，T_distComprises the following steps:

T_dist＝(W₁+W₂)*0.6

wherein, W₁And W₂And the widths of the face tracking frames of two passengers are respectively, and the vertical coordinate of the face central point of the passenger is smaller than that of the face central point of the other passenger, so that the passenger is judged to be shielded by the other passenger.

When the passenger shelters from, carry out unusual action detection to the passenger based on the motion characteristic, specifically as follows:

the motion characteristics of the passengers are composed of the speed and the speed direction of the passengers, and the situation that the human face position of the passengers is not easy to shield is considered, so that the motion speed of the passengers is calculated based on the human face position of the passengers, when the framework is completely extracted, the complete framework can better describe the behaviors of the passengers relative to the motion characteristics of the passengers, when the framework is incompletely extracted, the incomplete framework is easy to misdetect abnormal behaviors, and the motion characteristics can be better relative to the framework characteristicsDescribing the behavior of passengers, when the passengers are shielded, detecting the abnormal behavior of the passengers by using the motion characteristics of the passengers, calculating the motion speed of the passengers at intervals of t frames, and setting the center of the face frame of the first frame as P_l＝(x_l,y_l) The area of the face frame is S_l，x_lIs a central abscissa, y_lAs the central ordinate, the frame rate is fps, and the motion velocity v_lMagnitude of velocity | v_lI, speed direction theta_lRespectively as follows:

v_l＝P_l-P_l-t＝(x_l-x_l-t,y_l-y_l-t)

the normal state when the passenger boards the staircase is to stand on the staircase, and speed is unanimous with staircase operating speed, and passenger's speed size and speed direction all are the same with the operating speed size and the direction of staircase promptly, if the passenger in continuous several frame time speed size and speed direction surpass normal range, promptly passenger's speed size and speed direction satisfy following condition:

wherein the content of the first and second substances,

for corrected passenger movement speed, T_vMaximum speed of movement threshold in the normal state, T_θ1Minimum direction of movement threshold in the normal state, T_θ2The maximum motion direction threshold value in the normal state, because the image of the near object shot by the camera is larger than that of the far object, the closer to the camera, the passenger can be in the imageThe larger the calculated movement speed is, the larger the movement speed is, the speed needs to be corrected by dividing the movement speed by the area of the face of the passenger, and if the movement speed and the direction of the passenger are continuous T_lAnd if the frame meets the conditions, detecting that the passenger has abnormal behavior.

When the passenger is not shielded, abnormal behavior detection is carried out on the passenger based on the skeleton characteristics, and the type of the abnormal behavior is further identified by utilizing a space-time graph convolution model, wherein the detection method specifically comprises the following steps:

the normal state when a passenger takes the escalator is that two hands hang down and are placed on two sides of the body, two legs stand on the escalator, the head faces the front, the front of the body faces a camera, and abnormal behaviors generally have large action characteristics and are obviously different from normal behaviors, therefore, according to the behavior characteristics of the normal state of the passenger, the human body frameworks of 20 passengers in the normal behavior state are selected as normal behavior templates, the selected normal behavior templates can reflect various small-amplitude changes of the passenger behaviors in the normal state, so that the manufactured templates are more generalized, can tolerate the normal amplitude changes of the passenger behaviors, are respectively matched with the passenger human body frameworks extracted from each frame of image based on Euclidean distance to judge whether the passenger is the abnormal framework, and in order to adapt to the imaging size change caused by the distance between the passenger and the camera and the body type difference of the passenger, when template matching is carried out, human posture characteristic vectors of a template skeleton and a passenger skeleton are respectively extracted, then Euclidean distance of the two vectors is calculated to obtain matching similarity of the two vectors, when the human posture characteristic vector of the skeleton is calculated, 13 sections of human skeletons of the human skeleton are regarded as characteristic J containing 13 elements, and each element is a two-dimensional vector:

J＝{J¹,J²,…,J^m,…,J¹³}

wherein J^mTo be composed of a head end joint point B^mArticulation point E with the tail end^mThe m-th section of skeleton formed by interconnection, the coordinate of the head end of the skeleton vector is

Is the horizontal coordinate of the head end,

is a head end ordinate and a tail end ordinate

Is the horizontal coordinate of the tail end,

is a vertical coordinate of the tail end and an angle alpha in the horizontal direction^mAt an angle of beta in the vertical direction^mThe skeletal vector is represented as

The cosine values in the horizontal direction and the vertical direction are respectively:

respectively calculating the cosine values of the 13 segments of bones in the horizontal direction and the vertical direction, and then sequentially connecting the 13 segments of bones according to the arrangement sequence of the bones to obtain a 26-dimensional feature vector { cos alpha¹,cosβ¹,…,cosα¹³,cosβ¹³Uses it as human body posture characteristic vector, and further calculates skeleton SK to be matched_DAnd template framework SK_TMatching similarity of O (SK)_D,SK_T) Comprises the following steps:

wherein the content of the first and second substances,

is the cosine values of the i-th section of the skeleton to be matched in the horizontal and vertical directions,

the cosine values of the horizontal direction and the vertical direction of the ith section of the framework of the template framework are obtained, if the matching similarity of the framework of the passenger and all the template frameworks is smaller than a normal threshold value, the framework of the passenger is judged to be an abnormal framework, and if the framework of the passenger is T-shaped continuously_lIf the frame is judged to be an abnormal framework, detecting that the passenger has abnormal behavior;

if the abnormal behavior of the passenger is detected based on the skeleton characteristics, combining the abnormal skeletons of the passenger according to the time sequence to obtain an abnormal skeleton sequence, inputting the abnormal skeleton sequence into a space-time graph convolution model, identifying the type of the abnormal behavior, wherein the space-time graph convolution model is used for human behavior identification, modeling a dynamic skeleton based on the time sequence of the positions of the joints of the human body, naturally representing the dynamic skeleton mode by the time sequence of the positions of the joints of the human body in a two-dimensional or three-dimensional coordinate mode, then identifying the human behavior by analyzing the action mode, using the position information of the joints without using the space information of the bones and needing to communicate the bones, expanding the space-time graph convolution network to the space-time graph model, designing a general skeleton sequence representation for behavior identification to obtain a space-time graph convolution network, wherein two types of sides exist in the graph, namely a space side which is in accordance with the natural connection of the joints and a time side which is connected with the same joint in continuous time steps, on the basis, a multilayer space-time graph convolution is constructed, information is allowed to be integrated along two dimensions of space and time, a graph convolution model provides a novel idea for processing graph structure data, a convolution neural network which is commonly used for images in deep learning is applied to the graph data, the convolution mode of the convolution network on the graph adopts space domain convolution, the convolution neural network is generalized to any structural graph to obtain, given a body joint sequence under a two-dimensional or three-dimensional coordinate system, a space-time graph can be constructed, wherein the human body joint corresponds to nodes of the graph, the connectivity of the human body structure and the time connectivity correspond to two types of edges of the graph, therefore, the input of the space-time graph convolution model is a joint coordinate vector of the graph nodes, multilayer space-time graph convolution operation is applied to input data, a higher-level characteristic graph can be generated, and then the space-time graph convolution model is classified into corresponding action classes by a standard SoftMax classifier, through observation, when the passenger took the staircase, the unusual action that takes place mainly has forward to fall down, falls down backward, scrambles the handrail area, probes outside the staircase and probes outside the staircase five kinds of unusual actions, and other kinds of unusual action can all be categorized into among above-mentioned five kinds of actions, consequently, the action classification of discernment includes forward fall down, falls down backward, scrambles the handrail area, probes outside the staircase and probes outside the staircase five kinds of unusual actions.

Compared with the prior art, the invention has the following advantages and beneficial effects:

the escalator is mainly applied to the escalator in public places such as subway stations, office buildings and the like, can accurately detect abnormal behaviors of multiple passenger targets on the escalator in real time, further identifies the types of the abnormal behaviors, further timely feeds back the abnormal behavior results to the escalator control console through the wireless communication device, starts a corresponding safety accident emergency scheme according to the types of the abnormal behaviors, controls the running state of the escalator, and timely prevents safety accidents from occurring.

Drawings

FIG. 1 is a block flow diagram of the method of the present invention.

Fig. 2 is a schematic view of the installation position of the camera in the present invention.

Fig. 3 is an original image of the escalator monitoring area acquired by the camera.

Fig. 4 is a diagram of passenger face detection results based on the Adaboost classifier.

Fig. 5 is a kalman filter face tracking result diagram.

FIG. 6 is a drawing of the OpenPose deep learning network skeleton extraction effect.

Fig. 7 is a diagram showing a passenger occlusion detection result.

Fig. 8 is a diagram showing the recognition result of the abnormal behavior of the passenger.

Detailed Description

The present invention will be further described with reference to the following specific examples.

The method for detecting and identifying abnormal behaviors of multiple persons based on machine vision provided by the embodiment comprises the steps of firstly detecting a passenger face by using an HOG descriptor and an Adaboost classifier, tracking the passenger face by using Kalman filtering, then extracting passenger skeleton characteristics from an image by using an OpenPose deep learning network, then judging the shielding of the passenger based on the relative position of the passenger face, detecting the abnormal behaviors of the passenger based on motion characteristics when the passenger is shielded, detecting the abnormal behaviors of the passenger based on the skeleton characteristics when the passenger is not shielded, and further identifying the types of the abnormal behaviors by using a space-time graph convolution model, wherein a flow diagram is shown in figure 1, and the specific conditions are as follows:

1) escalator zone video image acquisition

The escalator monitoring method includes the steps that a wide-angle camera with 1280-720 image resolution is adopted to shoot an escalator region at an angle from top to bottom, monitoring video images are collected, in order to shoot passenger faces, the optical axis of the camera is parallel to an escalator handrail belt, the shooting angle covers the whole escalator running region, clear passenger face images with slight overlooking angles are obtained, the schematic diagram of the installation position of the camera is shown in figure 2, and the collected original images of the escalator monitoring region are shown in figure 3.

2) HOG characteristics are extracted and the Adaboost classifier is utilized to detect the faces of the passengers holding the elevator

The method for detecting the face of the escalator passenger by extracting the HOG characteristics of the face of the passenger and utilizing an Adaboost classifier comprises the following steps:

2.1) standardized color space

H(x,y)＝H(x,y)^Gamma

2.2) calculating image gradients

Using simple [ -1,0 [ ],1]Performing convolution operation on the image by using the template and the transpose thereof, calculating the gradient of the image, not performing Gaussian smoothing before calculating the gradient, reducing the performance by increasing the Gaussian smoothing, respectively calculating the gradient of each color channel for a multi-channel color image, taking the maximum norm as the gradient vector of the point, and taking the gradient of a pixel point H (x, y) in the escalator image

Is a vector:

gradient of gradient

The amplitude and direction of (a) are respectively:

wherein G (x, y) is a gradient

The amplitude of (a) of (b) is,

is a gradient

The direction of (a);

2.3) construction of gradient orientation histograms for each cell Unit

2.4) Intra-Block normalized gradient Direction histogram

2.5) Collection of HOG features

The basic idea of the AdaBoost algorithm is that a plurality of weak classifiers are trained on the same data set in multiple rounds and combined into a strong classifier, the classification accuracy of the weak classifiers is slightly larger than the randomly guessed classification accuracy, the weak classifiers are classifiers capable of processing weighted data, the accuracy and the recall ratio of the weak classifiers are poorer than those of the strong classifiers, a simple classifier is used as the weak classifier to obtain a better effect, and a single-layer decision tree is used as the weak classifier to process various types of dataData of type, therefore, a single-level decision tree is used as a weak classifier, a single-level decision tree h_j(x) Comprises the following steps:

wherein x is_jIs the value of the jth feature in the sample's feature vector, θ_jIs the decision threshold of the jth feature, p_jThe value of (1) or-1 is used for determining whether the judgment standard is greater than or less than the threshold, so that one weak classifier depends on three parameters of symbol direction, feature selection of dimension and judgment threshold, the weak classifier with the minimum error rate is used as the optimal weak classifier obtained by the round, the basic idea of the training of the AdaBoost strong classifier is to give a weight to each training sample in a data set and initialize the training samples to be equal values, firstly, one weak classifier is trained on the data set, then, the weight of each sample is adjusted, the weight of the sample which is classified correctly at the last time is reduced, the weight of the sample which is classified incorrectly is increased, the weak classifier is trained on the data set again, when the number of the weak classifiers reaches the specified number or the error rate is lower than the certain threshold, the training is completed, the passenger face detection result of the Adaboost classifier is as shown in FIG. 4, the face detection positions of the passengers are indicated by circular boxes.

3) Method for tracking faces of escalator passengers by using Kalman filter

The method comprises the following steps of tracking the faces of the escalator passengers by using a Kalman filter, wherein the Kalman filter comprises the following specific steps:

x_k＝A_k,k-1x_k-1+Bu_k+w_k-1

the system's observation equation is:

z_k＝H_kx_k+v_k

wherein the content of the first and second substances,

is the result of the optimization at the last moment,

The corresponding covariance, the update equation of the covariance is:

P_k,k-1＝A_k,k-1P_k-1A^T _k,k-1+Q_k-1

wherein, P_k-1Is composed of

Covariance of (P)_k,k-1Is that

Updating

P_k＝P_k,k-1-K_kH_kP_k,k-1

in summary, the essence of the kalman filter is a process of continuously predicting and updating the correction, and the kalman filter face tracking result is shown in fig. 5, where the face tracking position at the current time is marked by a rectangular frame, and the center points of the face tracking positions at all previous times are shown by solid dots.

4) The method comprises the following steps of extracting passenger skeleton features from an image by utilizing an OpenPose deep learning network, and specifically comprises the following steps:

in actual test, the input image can output a series of human body joint point confidence maps and skeleton PAFs maps through a model, wherein the number of the human body joint point confidence maps is consistent with that of skeleton joint points, the number of the skeleton PAFs maps is consistent with that of skeleton segments, a human body two-dimensional skeleton comprises 14 human body joint points including a nose, a neck, a left shoulder, a left elbow, a left wrist, a left hip, a left knee, a left ankle, a right shoulder, a right elbow, a right wrist, a right hip, a right knee and a right ankle and 13 segments of human bones formed by connecting the human body joint points, then the optimal connection problem of every two joint points of the human body is converted into a maximum weight value bipartite graph matching problem, the skeleton joint points are used as nodes in bipartite graphs, the PAFs are used as weight values of sides in the bipartite graphs, a complete human body skeleton is obtained by connecting the skeleton joint points and the bones by utilizing Hungary matching and greedy analysis algorithms, an OpenPose deep learning network skeleton extraction effect graph is shown in FIG. 6, the joint points are marked with solid circles and the bones are represented by straight line segments.

5) The passenger is shielded and judged based on the relative position of the face of the passenger, and the method specifically comprises the following steps:

T_dist＝(W₁+W₂)*0.6

wherein, W₁And W₂The widths of the face tracking frames of two passengers are respectively, and the vertical coordinate of the face central point of the passenger is smaller than that of the face central point of the other passenger, then the passenger is judged to be shielded by the other passenger, the passenger shielding detection result image is shown in figure 7, and the passenger at the right position of the image is shielded by the passenger at the middle position of the image。

the passenger's motion characteristic comprises passenger's speed size and speed direction, consider that passenger's face position is difficult to take place to shelter from, so calculate passenger's motion speed based on passenger's face position, when the skeleton draws completely, the motion characteristic of complete skeleton for passenger can describe passenger's action better, when the skeleton draws incompletely, unusual action false detection takes place easily for incomplete skeleton, the motion characteristic can describe passenger's action better for the skeleton characteristic, therefore, when passenger takes place to shelter from, utilize passenger's motion characteristic to carry out passenger's unusual action detection, it calculates passenger's motion speed to establish every t frames, the first frame face frame center is P_l＝(x_l,y_l) The area of the face frame is S_l，x_lIs a central abscissa, y_lAs the central ordinate, the frame rate is fps, and the motion velocity v_lMagnitude of velocity | v_lI, speed direction theta_lRespectively as follows:

v_l＝P_l-P_l-t＝(x_l-x_l-t,y_l-y_l-t)

wherein the content of the first and second substances,

for corrected passenger movement speed, T_vMaximum motion speed threshold in the normal state, T_θ1Minimum direction of motion threshold, T, for a normal state of 200_θ2The maximum movement direction threshold value is 250, which is the maximum movement direction threshold value in the normal state, because the image of the near object shot by the camera is larger than the image of the far object, the closer the camera is, the larger the movement speed calculated by the passenger in the image is, the speed needs to be corrected by dividing the movement speed by the area of the face of the passenger, and if the movement speed and the direction of the passenger are continuous T_lIf the above conditions are satisfied for 3 frames, the passenger is detected to have abnormal behavior.

J＝{J¹,J²,…,J^m,…,J¹³}

Is the horizontal coordinate of the head end,

is a head end ordinate and a tail end ordinate

Is the horizontal coordinate of the tail end,

wherein the content of the first and second substances,

the cosine values of the horizontal direction and the vertical direction of the ith section of the framework of the template framework are obtained, if the matching similarity of the framework of the passenger and all the template frameworks is smaller than a normal threshold value, the framework of the passenger is judged to be an abnormal framework, and if the framework of the passenger is T-shaped continuously_lIf the 3 frames are judged as abnormal skeletons, detecting that the passenger has abnormal behaviors;

if the abnormal behavior of the passenger is detected based on the skeleton characteristics, combining the abnormal skeletons of the passenger according to the time sequence to obtain an abnormal skeleton sequence, inputting the abnormal skeleton sequence into a space-time graph convolution model, identifying the type of the abnormal behavior, wherein the space-time graph convolution model is used for human behavior identification, modeling a dynamic skeleton based on the time sequence of the positions of the joints of the human body, naturally representing the dynamic skeleton mode by the time sequence of the positions of the joints of the human body in a two-dimensional or three-dimensional coordinate mode, then identifying the human behavior by analyzing the action mode, using the position information of the joints without using the space information of the bones and needing to communicate the bones, expanding the space-time graph convolution network to the space-time graph model, designing a general skeleton sequence representation for behavior identification to obtain a space-time graph convolution network, wherein two types of sides exist in the graph, namely a space side which is in accordance with the natural connection of the joints and a time side which is connected with the same joint in continuous time steps, on the basis, a multilayer space-time graph convolution is constructed, information is allowed to be integrated along two dimensions of space and time, a graph convolution model provides a novel idea for processing graph structure data, a convolution neural network which is commonly used for images in deep learning is applied to the graph data, the convolution mode of the convolution network on the graph adopts space domain convolution, the convolution neural network is generalized to any structural graph to obtain, given a body joint sequence under a two-dimensional or three-dimensional coordinate system, a space-time graph can be constructed, wherein the human body joint corresponds to nodes of the graph, the connectivity of the human body structure and the time connectivity correspond to two types of edges of the graph, therefore, the input of the space-time graph convolution model is a joint coordinate vector of the graph nodes, multilayer space-time graph convolution operation is applied to input data, a higher-level characteristic graph can be generated, and then the space-time graph convolution model is classified into corresponding action classes by a standard SoftMax classifier, through observation, when a passenger takes the escalator, the abnormal behaviors mainly include five abnormal behaviors of forward falling, backward falling, handrail climbing, probe outside the escalator and probe outside the escalator, and other types of abnormal behaviors can be classified into the five behaviors, so the recognized action types include five abnormal behaviors of forward falling, backward falling, handrail climbing, probe outside the escalator and probe outside the escalator, and fig. 8 is an abnormal behavior recognition effect graph, wherein the positions of the faces of the passenger are marked by rectangular frames, the bones of the passenger are marked by linear sections, the movement tracks of the passenger are marked by circular dots, each circular dot position represents the position of the passenger in a corresponding frame at the historical moment, and the abnormal behavior of the passenger in the graph toward the probe outside the escalator occurs.

The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims

1. The method for detecting and identifying the abnormal behavior of multiple persons based on machine vision is characterized by comprising the following steps: the method is used for detecting abnormal behaviors of a multi-passenger target of the escalator and identifying the types of the abnormal behaviors, and comprises the following steps:

1) acquiring a video image of an escalator region;

2) extracting HOG (histogram of gradient directions) features and detecting the faces of escalator passengers by using an Adaboost classifier;

3) tracking the faces of the escalator passengers by using a Kalman filter;

5) carry out the judgement of sheltering from to the passenger based on passenger face relative position, when the passenger takes place to shelter from, carry out unusual action detection to the passenger based on the motion characteristic, when the passenger does not shelter from, carry out unusual action detection to the passenger based on the skeleton characteristic to further utilize the type of space-time graph convolution model discernment unusual action, specifically as follows:

in the process of taking a staircase by passengers, crowding can occur due to too close positions, the shooting angle of the camera shoots the staircase area from near to far, the passenger close to the camera can shield the passenger far away from the camera, the shielded passenger lacks part of the skeleton, the absent skeleton can not well describe the behavior of the passenger, and abnormal behavior misdetection is easily caused, therefore, the passenger is required to be shielded and judged, if the passenger is shielded, abnormal behavior detection is carried out based on the motion characteristic of the passenger, if the passenger is not shielded, abnormal behavior detection is carried out based on the skeleton characteristic of the passenger, when the passenger is shielded and judged, the distance dist between the passenger and the surrounding passenger is respectively calculated for the passenger, and if the distance is less than the self-adaptive shielding threshold T_dist，T_distComprises the following steps:

T_dist＝(W₁+W₂)*0.6

wherein, W₁And W₂The widths of the face tracking frames of two passengers are respectively set, and the vertical coordinate of the face center point of the passenger is smaller than that of the face center point of the other passenger, so that the passenger is judged to be shielded by the other passenger;

v_l＝P_l-P_l-t＝(x_l-x_l-t,y_l-y_l-t)

wherein the content of the first and second substances,

for corrected passenger movement speed, T_vMaximum speed of movement threshold in the normal state, T_θ1Minimum direction of movement threshold in the normal state, T_θ2The maximum motion direction threshold value is the maximum motion direction threshold value in the normal state, because the image of a near object shot by the camera is larger than the image of a far object, the closer to the camera, the greater the motion speed calculated by the passenger in the image, the more the passenger needs to divide the area of the face of the passenger to correct the speed, and if the motion speed and the direction of the passenger are continuous T_lIf the frame meets the condition, detecting that the passenger has abnormal behavior;

J＝{J¹,J²,…,J^m,…,J¹³}

Is the horizontal coordinate of the head end,

is a head end ordinate and a tail end ordinate

Is the horizontal coordinate of the tail end,

wherein the content of the first and second substances,

if abnormal behaviors of passengers are detected based on skeleton characteristics, the abnormal skeletons of the passengers are combined according to time sequence to obtain an abnormal skeleton sequence, the abnormal skeleton sequence is input into a space-time graph convolution model to identify the types of the abnormal behaviors, the space-time graph convolution model is used for human behavior identification, dynamic skeletons are modeled based on the time sequence of human joint positions, dynamic skeleton modes can be naturally represented by the time sequence of the human joint positions in a two-dimensional or three-dimensional coordinate mode, then the human behavior identification can be achieved by analyzing the action modes, the position information of the joint points is used without utilizing the skeleton space information and the skeletons need to be communicated, the space-time graph convolution network is expanded to the space-time graph model to design a skeleton sequence general representation for behavior identification, and a space-time graph convolution network is obtained, wherein two types of sides exist in the graph, namely, the space side which accords with the natural connection of the joints and the time side which connects the same joint in continuous time steps, on the basis, a multilayer space-time graph convolution is constructed, information is allowed to be integrated along two dimensions of space and time, a graph convolution model provides a novel idea for processing graph structure data, a convolution neural network which is commonly used for images in deep learning is applied to the graph data, the convolution mode of the convolution network on the graph adopts space domain convolution, the convolution neural network is generalized to any structural graph to obtain, given a body joint sequence under a two-dimensional or three-dimensional coordinate system, a space-time graph can be constructed, wherein the human body joint corresponds to nodes of the graph, the connectivity of the human body structure and the time connectivity correspond to two types of edges of the graph, therefore, the input of the space-time graph convolution model is a joint coordinate vector of the graph nodes, multilayer space-time graph convolution operation is applied to input data, a higher-level characteristic graph can be generated, and then the space-time graph convolution model is classified into corresponding action classes by a standard SoftMax classifier, through observing, when the passenger took the staircase, the unusual action that takes place mainly has forward to fall down, falls down backward, scrambles the handrail area, probe and visit five kinds of unusual actions toward the staircase outer probe, and other kinds of unusual action can both classify among the above-mentioned five kinds of actions, consequently, the action classification of discernment includes forward fall down, falls down backward, scrambles the handrail area, visits five kinds of unusual actions of probe and visit the staircase outer probe.

2. The machine vision-based multi-person abnormal behavior detection and identification method according to claim 1, wherein: in the step 1), a wide-angle camera with 1280 × 720 image resolution is adopted to shoot an escalator area at an angle from the upper side to the lower side, a monitoring video image is collected, in order to shoot the face of a passenger, the optical axis of the camera is parallel to the handrail belt of the escalator, the shooting angle covers the whole escalator running area, and a clear image of the face of the passenger with a overlooking angle is obtained.

3. The machine vision-based multi-person abnormal behavior detection and identification method according to claim 1, wherein: in step 2), the HOG features are extracted and the faces of the escalator passengers are detected by using an Adaboost classifier, and the method comprises the following steps:

2.1) standardized color space

H(x,y)＝H(x,y)^Gamma

2.2) calculating image gradients

Performing convolution operation on an image by using a simple [ -1,0,1] template and the transposition thereof, calculating the gradient of the image, not performing Gaussian smoothing before calculating the gradient, reducing the performance by increasing the Gaussian smoothing, respectively calculating the gradient of each color channel for a multi-channel color image, taking the gradient with the maximum norm as a gradient vector of the point, and taking the gradient H (x, y) of a pixel point H (x, y) in the escalator image as a vector:

the magnitude and direction of gradient ∑ H (x, y) are:

wherein G (x, y) is the magnitude of gradient H (x, y),

is the direction of gradient ≧ H (x, y);

2.3) construction of gradient orientation histograms for each cell Unit

2.4) Intra-Block normalized gradient Direction histogram

2.5) Collection of HOG features

The basic idea of the AdaBoost algorithm is to train a plurality of weak classifiers on the same data set in multiple rounds and combine the weak classifiers into a strong classifier, wherein the classification accuracy of the weak classifier is greater than that of random guess, and the weak classifier is a classification capable of processing weighted dataThe precision rate and the recall rate of the weak classifiers are poorer than those of the strong classifiers, a simple classifier is used as the weak classifier to obtain a better effect, and a single-layer decision tree is used as the weak classifier to process various types of data, so that the single-layer decision tree is used as the weak classifier, and the single-layer decision tree h is used as the weak classifier_j(x) Comprises the following steps:

wherein x is a feature vector of a certain sample, x_jIs the value of the jth feature in the feature vector, θ_jIs the decision threshold of the jth feature, p_jThe value of (1) is 1 or-1, and is used for determining whether the judgment standard is greater than or less than a threshold value, so that one weak classifier depends on three parameters of symbol direction, feature selection of dimensionality and judgment threshold value, the weak classifier with the minimum error rate is taken as the optimal weak classifier obtained in the round, and the basic idea of AdaBoost strong classifier training is to give a weight to each training sample in a data set and initialize the training samples to be equal values; firstly training a weak classifier on the data set, then adjusting the weight of each sample, reducing the weight of the sample which is classified correctly at the last time, increasing the weight of the sample which is classified wrongly, training the weak classifier again on the data set, and finishing the training when the number of the weak classifiers reaches the specified number or the error rate is lower than a certain threshold value.

4. The machine vision-based multi-person abnormal behavior detection and identification method according to claim 1, wherein: in the step 3), a Kalman filter is utilized to track the faces of the escalator passengers, and the method specifically comprises the following steps:

x_k＝A_k,k-1x_k-1+Bu_k+w_k-1

the system's observation equation is:

z_k＝H_kx_k+v_k

wherein the content of the first and second substances,

is the result of the optimization at the last moment,

predicted from the result of the previous momentSystem state value, after updating the current state of the system, it needs to be updated

The corresponding covariance, the update equation of the covariance is:

P_k,k-1＝A_k,k-1P_k-1A^T _k,k-1+Q_k-1

wherein, P_k-1Is composed of

Covariance of (P)_k,k-1Is that

Updating

P_k＝P_k,k-1-K_kH_kP_k,k-1

5. The machine vision-based multi-person abnormal behavior detection and identification method according to claim 1, wherein: in step 4), extracting passenger skeleton features from the image by using an openpos deep learning network, specifically as follows:

compared with the optical flow, appearance and depth information, the skeleton can better describe the behavior information of passengers, the OpenPose deep learning network can be used for accurately, real-timely and stably extracting the two-dimensional human skeleton of the passengers under the conditions of uneven illumination and shadow, the existing posture estimation method is divided into a top-down method and a bottom-up method, the top-down method needs to detect each person from an image, then carries out posture estimation on each person respectively, extracts the two-dimensional skeleton of the person, the method is influenced by the performance of a human body detector, the time consumption of an algorithm is increased along with the increase of the number of people in the image, the method is a bottom-up method relative to the top-down method, the posture estimation is carried out from bottom to top without detecting the person, the time consumption of the algorithm is not influenced by the number of the people, but ignores the relevant information between the whole pedestrian and the joint points of the skeleton to which the pedestrian belongs to, the method is characterized in that skeleton joint points to which pedestrians belong are associated to the whole pedestrian by adopting other methods, aiming at the problem of skeleton extraction from bottom to top, an OpenPose deep learning network provides a part of affinity fields PAFs for non-parametric explicit representation of the connection of the human body part joint points, the PAFs are a set of two-dimensional vectors, each section of the body skeleton corresponds to a PAFs graph, the size of the PAFs graph is the same as that of an original graph, each point in the graph is a two-dimensional vector and represents components in the horizontal direction and the vertical direction respectively, the position and the direction of one section of the skeleton are coded, the body part joint points belonging to the whole pedestrian can be connected by utilizing the PAFs, the passenger two-dimensional skeleton is extracted, the OpenPose network is divided into a plurality of stages, the output of each stage is compared with the real value to obtain the corresponding loss function, the loss functions of all the stages are accumulated to obtain the total loss function, and the model is facilitated, optimizing a total loss function, and performing iterative training to obtain a final model;

during actual test, an input image can output a series of human body joint point confidence graphs and skeleton PAFs graphs through a model, wherein the number of the human body joint point confidence graphs is consistent with that of skeleton joint points, the number of the skeleton PAFs graphs is consistent with that of skeleton segments, a human body two-dimensional skeleton comprises 14 human body joint points including a nose, a neck, a left shoulder, a left elbow, a left wrist, a left hip, a left knee, a left ankle, a right shoulder, a right elbow, a right wrist, a right hip, a right knee and a right ankle and 13 segments of human bones formed by connecting the human body joint points, then the optimal connection problem of every two joint points of a human body is converted into a maximum weight bipartite graph matching problem, the skeleton joint points are used as nodes in bipartite graphs, the PAFs are used as weights on the sides of the bipartite graphs, and a Hungary matching and greedy analysis algorithm is used for connecting the skeleton joint points and the bones to obtain a complete human body skeleton.