CN112149618A

CN112149618A - Pedestrian abnormal behavior detection method and device suitable for inspection vehicle

Info

Publication number: CN112149618A
Application number: CN202011094116.8A
Authority: CN
Inventors: 贾立东
Original assignee: Ziqing Zhixing Technology Beijing Co ltd
Current assignee: Ziqing Zhixing Technology Beijing Co ltd
Priority date: 2020-10-14
Filing date: 2020-10-14
Publication date: 2020-12-29
Anticipated expiration: 2040-10-14
Also published as: CN112149618B

Abstract

The invention discloses a pedestrian abnormal behavior detection method and device suitable for a patrol vehicle, wherein the method comprises the following steps: presetting an abnormal behavior detection network, and carrying out online detection on abnormal behaviors in video data acquired from an RGB-D camera and a monitoring camera; the abnormal behavior detection network acquisition method comprises the following steps: step 1, calibrating behavior types and attributes existing in original video data to obtain a type label A ═ a corresponding to each behavior₁，a₂，...，a_nAnd attribute label B ═ B_nor，b_ab}; step 2, selecting a continuous T frame image sequence as training sample data; step 3, establishing an undirected graph G ═ V, E; set of joint node positions V ═ { V ═ V_tiI T1, T, i 1, N, including all joints in a sequence of consecutive T frame imagesThe position of the node, the edge set E has two subsets which are respectively a space edge subset and a time sequence edge subset; and 4, acquiring an abnormal behavior detection network by using the time-space graph convolutional network and the convolutional neural network according to the undirected graph G (V, E). The invention can accurately detect abnormal behaviors all day long, and provides basis for intelligent monitoring and early warning.

Description

Pedestrian abnormal behavior detection method and device suitable for inspection vehicle

Technical Field

The invention relates to the technical field of robots and video monitoring, in particular to a pedestrian abnormal behavior detection method and device based on an RGB-D camera and suitable for a patrol vehicle.

Background

The abnormal behavior detection means that under a video monitoring scene, monitoring personnel are replaced or supplemented based on various computer equipment to distinguish abnormal behaviors of human beings under the video monitoring scene in real time, so that the early warning purpose is achieved. The abnormal behaviors generally refer to behaviors which are obviously different from other human behaviors in a scene or are low in occurrence probability of human beings in the scene. In fact, the probability of abnormal behavior is very small compared to normal behavior, but the abnormal behavior has almost serious consequences. Therefore, it is necessary to monitor and warn abnormal behaviors.

The robot technology can gradually replace some monotonous and dangerous work scenes which need human beings, thereby greatly reducing the labor cost. At present, the mode that the control early warning generally adopted vision camera to combine artifical control carries out, but these two kinds of monitoring methods all have the weakening and the inconvenience of different degrees night, and its defect mainly manifests: the perception capability of human and ordinary visible light cameras is reduced in the night environment. There is currently no particularly good solution for this situation.

Behavior recognition is an important class of computer vision tasks that have wide application in the fields of video understanding, motion analysis, and intelligent monitoring. Kinect is a civilian RGB-D camera from microsoft, which is capable of simultaneously acquiring color images and depth images, and is equipped with an infrared device, which is capable of operating in a nighttime environment, and is a typical 3D object recognition device.

Disclosure of Invention

It is an object of the present invention to provide a method and apparatus for detecting abnormal behavior of pedestrians suitable for patrolling vehicles based on RGB-D cameras to overcome or at least alleviate at least one of the above-mentioned drawbacks of the prior art.

In order to achieve the purpose, the invention provides a pedestrian abnormal behavior detection method suitable for a patrol vehicle, which is characterized in that an abnormal behavior detection network is preset to carry out online detection on abnormal behaviors in video data collected by an RGB-D camera and a monitoring camera; the abnormal behavior detection network acquisition method comprises the following steps:

step 1, calibrating behavior types and attributes existing in original video data to obtain a type label A ═ a corresponding to each behavior₁,a₂,...,a_nAnd attribute label B ═ B_nor,b_ab}; wherein the raw video data comprises RGB-D camera data, a_nClass label representing nth behavior, b_norIndicating normal behavior, b_abRepresenting abnormal behavior;

step 2, selecting the continuous T frame image sequence calibrated in the step 1 as training sample data;

step 3, acquiring a human body position frame in each frame of image of training sample data, and establishing an undirected graph G (V, E);

wherein the set of joint node positions V ═ { V ═ V_tiI T1, T, i 1, N includes the positions of all joint nodes in a sequence of consecutive T frame images, v_tiThe position of the ith joint node in the t frame image is shown, and N joint nodes are shared;

the edge set E has two subsets, which are respectively a spatial edge subset and a temporal edge subset, wherein: the spatial edge subset E_S＝{v_tiv_tjL (i, j) belongs to H, wherein H represents the positions of all joint nodes in the single-frame image; time sequence edge subset E consisting of edges formed by connecting same joint nodes on two adjacent frames of images_F＝{v_tiv_(t+1)i}，v_tjIndicating the position of the jth joint node in the tth frame image, v_(t+1)iRepresenting the position of the jth joint node in the t +1 th frame image;

and 4, acquiring an abnormal behavior detection network by using the time-space graph convolutional network and the convolutional neural network according to the undirected graph G (V, E).

Further, step 4 comprises:

step 41, performing space-time space position modeling by using a space-time graph convolution network, which specifically comprises:

step 411, performing a space-time graph convolution operation on the undirected graph G ═ V, E,adopting convolution kernel as K x K, output and joint node v_tiDirectly connected local feature map f_out(v_ti)：

In the formula (1), B (v)_ti) Corresponding representation v_tiSet of neighbor nodes of, w (v)_ti,v_tj) Is a weight function, Z_ti(v_tj) Is a regular function term, p (v)_ti,v_tj) Representing a sampling function, representing the minimum distance between two points having a relationship in the joint points, and a weighting function used for representing different weights assigned to adjacent joint points;

step 412, for the local feature map f output in step 411_out(v_ti) Adopting a formula (2) to divide the joint node space configuration to obtain the local limbs of the human body formed by the joint nodes;

in the formula (2), r_iRepresents the average distance r from the ith joint node to the center of gravity of the human body where the ith joint node is located_jRepresents the average distance between the j-th joint node and the center of gravity of the human body where the j-th joint node is located, i_ti(v_tj) And expressing the calculated joint node space configuration, and dividing each joint node and the surrounding neighbors thereof into a root node subset, a centripetal group subset and a centrifugal group subset, wherein the root node subset consists of the root node, the centripetal group subset consists of nodes closer to the body gravity center node than the root node in the neighbor nodes, and the centrifugal group subset consists of nodes farther away from the body key node than the root node in the neighbor nodes.

Further, step 4 further comprises:

step 42, performing and acquiring the classified behavior by using a space-time graph convolutional network, which specifically includes:

step 421, using the output of step 411 and the joint node v_tiDirectly connected local feature map f_out(v_ti) And the spatial configuration obtained by the division in the step 412, a single-layer spatiotemporal graph convolution operation provided by the formula (3) is operated on each frame of image, and skeleton action characteristic classification is obtained;

in the formula (3), f_outRepresenting a skeletal motion feature classification, f_inRepresenting the node output characteristic diagram output in step 411 and the spatial configuration obtained by dividing in step 412,

determined by the adjacency matrix a and the identity matrix I of the single-frame image,

an adjacency matrix represented throughout undirected graph G ═ V, E,

an identity matrix representing the entire undirected graph G ═ V, E, W is a weight vector W (V, E) formed by the aforementioned multipaths_ti,v_tj) A superimposed weight matrix;

step 422, after performing multiple layers of space-time graph convolution operations such as in steps 411, 412 and 421, finally outputting the final behavior classification result through a Softmax function.

Further, step 4 further comprises:

step 43 of outputting a behavior sequence f for the classification corresponding to the joint node position set in the previous T/2 frame image in the undirected graph G ═ V, E created in step 3_outEach behavior in the sequence is coded into a unified vector x_uniWill unify the vector x_uniInputting into LSTM network for training, and outputting behavior prediction result

Step 44, will get throughOriginal behavior result STC identified by blank-map convolutional network_actionBehavior outcome LSTM predicted with LSTM network_actionPerforming fusion according to formula (4):

in the formula (4), if the difference between the two is smaller than the set threshold, the final result is the result after the original space-time convolution image, and if the difference between the two is larger than the set threshold, the intermediate value of the two prediction results is taken.

Further, the abnormal behavior detection network is preset in a cloud control platform, a vehicle-mounted controller and/or an intelligent terminal.

The invention also provides a pedestrian abnormal behavior detection device suitable for a patrol vehicle, which comprises an abnormal behavior detection network, wherein the abnormal behavior detection network is used for carrying out online detection on abnormal behaviors in video data acquired from an RGB-D camera and a monitoring camera, and an acquisition device of the abnormal behavior detection network comprises:

a calibration module, configured to calibrate behavior categories and attributes existing in original video data to obtain a category label a ═ a corresponding to each behavior₁,a₂,...,a_nAnd attribute label B ═ B_nor,b_ab}; wherein the raw video data comprises RGB-D camera data, a_nClass label representing nth behavior, b_norIndicating normal behavior, b_abRepresenting abnormal behavior;

a training sample data selecting module, configured to select the continuous T frame image sequence calibrated in step 1 as training sample data;

the undirected graph constructing device is used for acquiring a human body position frame in each frame of image of training sample data and establishing an undirected graph G (V, E);

and the abnormal behavior detection network offline training module is used for acquiring the abnormal behavior detection network by utilizing the time-space graph convolutional network and the convolutional neural network according to the undirected graph G (V, E).

Further, the abnormal behavior detection network offline training module includes:

the spatio-temporal spatial position modeling unit is used for performing spatio-temporal spatial position modeling by utilizing a spatio-temporal graph convolution network, and specifically comprises the following steps:

firstly, a space-time graph convolution operation is performed on an undirected graph G ═ V, E, a convolution kernel is adopted to be K multiplied by K, and an output and joint node V_tiDirectly connected local feature map f_out(v_ti)：

then, for the local feature map f_out(v_ti) Adopting a formula (2) to divide the joint node space configuration to obtain the local limbs of the human body formed by the joint nodes;

Further, the offline training module for the abnormal behavior detection network further includes:

the behavior classification unit is used for performing and acquiring classified behaviors by utilizing a space-time graph convolutional network, and specifically comprises the following steps:

firstly, the joint nodes v output by the space-time space position modeling unit are utilized_tiThe directly connected local feature graph and the space configuration obtained by division run single-layer space-time graph convolution operation on each frame image to obtain skeleton action feature classification;

then, after convolution operation of a plurality of layers of space-time space position modeling units and behavior classification units, finally, a final behavior classification result is output through a Softmax function;

a behavior prediction unit that outputs a behavior sequence f for classification corresponding to a set of joint node positions in the previous T/2-frame image in the undirected graph G ═ V, E_outEach behavior in the sequence is coded into a unified vector x_uniWill unify the vector x_uniInputting into LSTM network for training, and outputting behavior prediction result

FusionA unit for STC of original behavior result identified via space-time graph convolutional network_actionBehavior outcome LSTM predicted with LSTM network_actionPerforming fusion according to formula (4):

Further, the abnormal behavior detection network is preset in the cloud control platform, the vehicle-mounted controller and/or the intelligent terminal.

Furthermore, the intelligent terminal is a cruise machine trolley and comprises an RGB-D camera in signal connection with the abnormal behavior detection network.

Due to the adoption of the technical scheme, the invention has the following advantages: the invention can accurately detect abnormal behaviors all day long, and provides basis for intelligent monitoring and early warning.

Drawings

FIG. 1 is a block diagram of the abnormal behavior detection computation framework of the present invention.

Fig. 2 is a schematic view of a human body 18 joint point.

Fig. 3 is a schematic diagram of spatial configuration partitioning.

Detailed Description

The invention is described in detail below with reference to the figures and examples.

As shown in fig. 1, the method for detecting abnormal behavior of a pedestrian suitable for a patrol vehicle according to the embodiment of the present invention includes: presetting an abnormal behavior detection network, and carrying out online detection on abnormal behaviors in video data acquired from an RGB-D camera and a monitoring camera; the abnormal behavior detection network acquisition method comprises the following steps:

step 1, calibrating behavior types and attributes existing in original video data to obtain a type label A ═ a corresponding to each behavior₁,a₂,...,a_nAnd attribute label B ═ B_nor,b_ab}，a_nClass label representing nth behavior, b_norIndicating normal behavior, b_abIndicating abnormal behavior.

Wherein the raw video data comprises full-day RGB-D camera data and surveillance camera data based on the determined surveillance area.

In one embodiment, the calibration method adopted in step 1 is a method combining automatic algorithm calibration and human-aided calibration, and specifically includes:

step 11, automatic calibration: and automatically calibrating the original video data by adopting an AlphaPose algorithm, wherein the attitude confidence coefficient value range of the automatic calibration is [0,1 ]. When the attitude confidence value is 0, the automatic calibration result is completely unconfirmed or no usable attitude calibration result is generated; when the pose confidence value is 1, it indicates that the pose confidence value is confident.

Step 12, manual secondary calibration: and according to a given calibration strategy, carrying out secondary manual calibration on the automatic calibration result which does not meet one of the following conditions.

The first condition is that when the attitude confidence coefficient obtained by the automatic calibration algorithm is less than a preset threshold value P_α(such as P)_α0.5), performing secondary calibration by hand;

the second condition, for night video (12 pm. to 6 am.), for night video is manually calibrated by randomly sampling 1 minute of video 1 time every 5 minutes.

Using a manual calibration method to classify the attributes of the well-calibrated behaviors into normal behaviors b_norAnd abnormal behavior b_abAnd constructing a behavior and attribute data set.

And 2, selecting the continuous T frame image sequence calibrated in the step 1 as training sample data.

And 3, acquiring a human body position frame in each frame of image of the training sample data, and establishing an undirected graph G (V, E). Wherein, the human body position frame can be obtained by adopting a pre-trained target detection model YOLO v 5.

UndirectionalIn (V, E), the joint node position set V ═ V_tiI T1, T, i 1, N includes the positions of all joint nodes in a sequence of consecutive T frame images, v_tiThe position of the ith joint in the t-th frame image is shown, and N joint nodes are shared.

In the undirected graph G ═ V, E, the set of edges E has two subsets, respectively a spatial edge subset and a temporal edge subset.

Wherein: the spatial edge subset E_S＝{v_tiv_tjAnd l (i, j) belongs to H, wherein H represents the positions of all joint nodes in the single-frame image. Time sequence edge subset E consisting of edges formed by connecting same joint nodes on two adjacent frames of images_F＝{v_tiv_(t+1)i}。v_tjIndicating the position of the jth joint node in the tth frame image, v_(t+1)iIndicates the position of the jth joint node in the t +1 th frame image.

As shown in fig. 2, fig. 2 provides an embodiment illustrating an undirected graph G ═ V, E for a single frame image, where the total number of individual human body nodes is set to 18.

And 4, according to the undirected graph G, performing space-time space position modeling and acquiring classified behaviors by using a space-time graph convolutional network, and performing weight estimation on different behaviors by using a convolutional neural network module to judge whether the behaviors belong to abnormal behaviors, so as to acquire an abnormal behavior detection network.

In one embodiment, step 4 comprises:

step 411, performing a space-time graph convolution operation on the undirected graph G ═ V, E, and outputting the undirected graph G ═ V_tiDirectly connected local feature maps.

Specifically, a feature map f is input_inIf the undirected graph G including nodes but having no connection between nodes is (V, E) (as shown in fig. 2), and the convolution kernel is K × K, the output and joint node V is obtained_tiDirectly connected local feature map f_out(v_ti) (as shown in FIG. 3):

in the formula (1), B (v)_ti) Corresponding representation v_tiThe neighbor node set of (2); z_ti(v_tj) Is a regular function term, here any machine learning L1 or L2 regular term can be; p (v)_ti,v_tj) The method comprises the steps of representing a sampling function, representing the minimum distance between two points with a relation in a joint point, and adopting any general sampling function, such as random particle filtering and the like; w (v)_ti,v_tj) Is a weight function for representing different weights assigned to neighboring joint points, wherein a neighboring joint point refers to a joint node directly connected or related to a certain joint point around it.

Step 412, for the joint node v output in step 411_tiDirectly connected local feature map f_out(v_ti) And (4) carrying out joint node space configuration division to obtain the human body local limb formed by the joint nodes.

Joint node spatial configuration partitioning has a variety of ways, such as uniform labeling, distance-based, and spatial configuration-based. Considering the applicable approach of the present invention, for the established joint node v_tiDirectly connected local feature map f_out(v_ti) Spatial configuration partitioning is performed as shown in fig. 3. Dividing each local joint point and its surrounding neighbors into three subsets, namely a root node subset, a centripetal group subset and a centrifugal group subset, wherein the root node subset is composed of the root node itself, the centripetal group subset is composed of nodes closer to a body gravity center node than the root node in the neighboring nodes, and the centrifugal group subset is composed of nodes farther away from the body gravity center node than the root node in the neighboring nodes. The spatial configuration division is carried out by adopting an equation (2):

in the formula (2), r_iRepresents the average distance between the ith joint node and the center of gravity of the human body (the position of the triangle in figure 3)，r_jRepresents the average distance between the j-th joint node and the center of gravity of the human body where the j-th joint node is located, i_ti(v_tj) The calculated spatial configuration of the joint nodes is shown, and as shown in fig. 2, the

joint nodes

11, 12 and 13 in fig. 2 after calculation form the arms of the human body.

step 421, using the output of step 411 and the joint node v_tiDirectly connected local feature map f_out(v_ti) And step 412, the obtained spatial configuration is divided, and a single-layer space-time graph convolution operation is performed on each frame of image to obtain skeleton action feature classification.

Step 421 uses the output of step 411 and the joint node v_tiDirectly connected local feature map f_out(v_ti) The spatial configuration obtained by dividing in step 412 is represented by the following equation (3) by performing a single-layer space-time graph convolution operation on each frame of image:

from contiguous matrices of a single frame image

And identity matrix

It is determined that,

an adjacency matrix represented throughout undirected graph G ═ V, E,

an identity matrix representing the entire undirected graph G ═ V, E, W is a weight vector W (V, E) formed by the aforementioned multipaths_ti,v_tj) And stacking the weight matrixes.

Step 422, after performing space-time graph convolution operations in multiple layers (e.g. 5 layers) such as step 411, step 412 and step 421, finally outputting the final behavior classification result STC through the Softmax function_action＝{act₁,act₂,...,act_T}。

Step 43, based on the undirected graph G ═ V, E, the classified output behavior sequence f corresponding to the joint node position set in the previous T/2 frame image in the undirected graph G ═ V, E established in step 3 is output_outEach behavior in the sequence is coded into a unified vector x_uniWill unify the vector x_uniInputting into LSTM (English is called Long Short-Term Memory, Chinese is called Long-time Memory network) network for training, and outputting behavior prediction result

Through step 43, the LSTM can model the long-term dependence in the obtained action sequence information, and can be used to solve the problem of behavior jump in the foregoing behavior recognition, and improve the robustness of the abnormal behavior detection network.

Step 44, identifying the original behavior result STC identified by the space-time graph convolution network_actionBehavior outcome with LSTM network prediction

Performing fusion according to formula (4):

in the formula (4), if the difference between the two is less than the set threshold value H_threIf the difference result between the two results is greater than the set threshold value, the two prediction results are takenAn intermediate value. Wherein the threshold value H_threThe normalized average behavior values for all normal behaviors in the calibrated behavior library.

The original behavior recognition result can be compared with the behavior prediction result through the step 44, optimization and enhancement are performed, and the problem of behavior jump easily occurring in behavior recognition is solved.

And step 45, classifying the normal behavior and the abnormal behavior of the final behavior recognition result.

And classifying the identified action sequence into normal behavior and abnormal behavior through a shallow convolutional neural network module according to a set rule, and giving an alarm if the abnormal behavior is output.

In one embodiment, the present invention further comprises:

step 5, the model pruning method specifically comprises the following steps:

step 51, evaluating the importance of the neuron parameters in the proposed abnormal behavior detection network, removing the neuron with the lowest importance, and then fine-tuning the abnormal behavior detection network.

And step 52, performing model pruning by adopting a group convolution pruning method according to the pre-trained YOLO v5 network, and then performing reasoning deployment on a cloud control platform, a vehicle-mounted controller and/or an intelligent terminal (cruise trolley).

The method provided by the embodiment of the invention firstly utilizes a convolution network to detect the human body area, then models the human skeleton key points into a graph convolution network, utilizes the identified skeleton key points to identify the human body behavior and detects the abnormal behavior. The matched LSTM enhancement algorithm can utilize the continuity of human behaviors to enhance the behavior identification accuracy.

In one embodiment, the pedestrian abnormal behavior detection method suitable for the patrol vehicle comprises the following steps: an abnormal behavior detection network is preset in the cloud control platform, the vehicle-mounted controller and/or the intelligent terminal (cruise trolley).

Taking a cruise trolley configured with an abnormal behavior detection network as an example, the method for detecting the abnormal behavior on line by the abnormal behavior detection network is described as follows:

the cruise trolley is used for routing inspection according to a specified route, and is provided with an RGB-D enhanced camera, behavior detection can be carried out by using RGB images and depth images generated in the daytime environment, and the behavior detection can be realized by using an infrared camera and a depth camera at night, so that abnormal behavior detection in all-day video data can be realized. When abnormal behaviors are detected, an audible and visual alarm mode can be adopted for early warning.

The invention also provides a pedestrian abnormal behavior detection device suitable for the inspection vehicle, which comprises an abnormal behavior detection network used for carrying out online detection on abnormal behaviors in video data collected by the RGB-D camera and the monitoring camera. The device for acquiring the abnormal behavior detection network comprises a calibration module, a training sample data selection module, an undirected graph construction device and an abnormal behavior detection network offline training module, wherein:

the calibration module is used for calibrating the behavior categories and attributes existing in the original video data to obtain the category label A ═ a corresponding to each behavior₁,a₂,...,a_nAnd attribute label B ═ B_nor,b_ab}; wherein the raw video data comprises RGB-D camera data, a_nClass label representing nth behavior, b_norIndicating normal behavior, b_abIndicating abnormal behavior.

And the training sample data selection module is used for selecting the continuous T frame image sequence calibrated in the step 1 as training sample data.

The undirected graph construction device is used for acquiring a human body position frame in each frame of image of training sample data and establishing an undirected graph G (V, E).

Wherein the set of joint node positions V ═ { V ═ V_tiI T1, T, i 1, N includes the positions of all joint nodes in a sequence of consecutive T frame images, v_tiThe position of the ith joint in the t-th frame image is shown, and N joint nodes are shared.

The edge set E has two subsets, which are respectively a spatial edge subset and a temporal edge subset, wherein: the spatial edge subset E_S＝{v_tiv_tjL (i, j) belongs to H, wherein H represents the positions of all joint nodes in the single-frame image; the same joint node on two adjacent frames of images is connectedTime-sequential edge subset E of formed edges_F＝{v_tiv_(t+1)i}，v_tjIndicating the position of the jth joint node in the tth frame image, v_(t+1)iIndicates the position of the jth joint node in the t +1 th frame image.

And the abnormal behavior detection network offline training module is used for acquiring the abnormal behavior detection network by utilizing the space-time graph convolutional network and the convolutional neural network according to the undirected graph G (V, E).

In one embodiment, the abnormal behavior detection network offline training module comprises a space-time space position modeling unit, a behavior classification unit, a behavior prediction unit and a fusion unit:

the space-time space position modeling unit is used for performing space-time space position modeling by utilizing a space-time graph convolution network, and specifically comprises the following steps:

first, a space-time graph convolution operation is performed on the undirected graph G ═ V, E, and a node feature graph is output.

And then, carrying out joint node space configuration division on the node characteristic diagram to obtain a human body local part formed by joint nodes.

The method for dividing the spatial configuration comprises the following steps:

dividing each joint node and its surrounding neighbors into a root node subset, a centripetal group subset and a centrifugal group subset, wherein the root node subset consists of the root node itself, the centripetal group subset consists of nodes closer to a body gravity center node than the root node in the neighbor nodes, and the centrifugal group subset consists of nodes farther from the body gravity center node than the root node in the neighbor nodes;

and a behavior classification unit. The method is used for performing and acquiring the classified behaviors by utilizing a space-time graph convolutional network, and specifically comprises the following steps:

firstly, a single-layer spatiotemporal graph convolution operation is operated on each frame image by utilizing the node output characteristic graph output by the spatiotemporal position modeling unit and the space configuration obtained by division to obtain skeleton action characteristic classification.

And finally, outputting a final behavior classification result through a Softmax function after convolution operation of a plurality of layers of space-time space position modeling units and behavior classification units.

A behavior prediction unit for 4 outputting a behavior sequence f for classification corresponding to the joint node position set in the previous T/2 frame image in the undirected graph G ═ V, E_outEach behavior in the sequence is coded into a unified vector x_uniWill unify the vector x_uniInputting into LSTM network for training, and outputting behavior prediction result

A fusion unit for STC of original behavior result identified via space-time graph convolution network_actionBehavior outcome LSTM predicted with LSTM network_actionPerforming fusion according to formula (4):

In one embodiment, the abnormal behavior detection network is preset in the cloud control platform, the vehicle-mounted controller and/or the intelligent terminal, and is used for performing online detection on abnormal behaviors existing in video data acquired by the RGB-D camera and the monitoring camera.

In one embodiment, the intelligent terminal is a cruise machine trolley and comprises an RGB-D camera and an abnormal behavior detection network in signal connection with the RGB-D camera. The abnormal behaviors in the patrol environment can be detected all day long through the RGB-D camera in the cruise machine trolley and the abnormal behavior detection network.

Finally, it should be pointed out that: the above examples are only for illustrating the technical solutions of the present invention, and are not limited thereto. Those of ordinary skill in the art will understand that: modifications can be made to the technical solutions described in the foregoing embodiments, or some technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A pedestrian abnormal behavior detection method suitable for a patrol vehicle is characterized in that an abnormal behavior detection network is preset to carry out online detection on abnormal behaviors existing in video data collected by an RGB-D camera and a monitoring camera; the abnormal behavior detection network acquisition method comprises the following steps:

2. The pedestrian abnormal behavior detection method suitable for a patrol vehicle according to claim 1, wherein step 4 includes:

step 411, performing a space-time graph convolution operation on the undirected graph G ═ V, E, with a convolution kernel K × K, and outputting the undirected graph and the joint node V_tiDirectly connected local feature map f_out(v_ti)：

in the formula (2), r_iRepresents the average distance r from the ith joint node to the center of gravity of the human body where the ith joint node is located_jRepresents the average distance between the j-th joint node and the center of gravity of the human body where the j-th joint node is located, i_ti(v_tj) Representing the space configuration of the joint nodes obtained by calculation, and dividing each joint node and the surrounding neighbors thereof into a root node subset and a centripetal group subsetAnd a centrifugal group subset, wherein the root node subset consists of the root node itself, the centripetal group subset consists of nodes closer to the body gravity center node than the root node among the neighbor nodes, and the centrifugal group subset consists of nodes farther from the body gravity center node than the root node among the neighbor nodes.

3. The method for detecting the abnormal behavior of the pedestrian adapted to the patrol vehicle according to claim 2, wherein the step 4 further comprises:

an adjacency matrix represented throughout undirected graph G ═ V, E,

4. The pedestrian abnormal behavior detection method suitable for the patrol vehicle according to claim 2 or 3, wherein the step 4 further includes:

Step 44, identifying the original behavior result STC identified by the space-time graph convolution network_actionBehavior outcome LSTM predicted with LSTM network_actionPerforming fusion according to formula (4):

5. The pedestrian abnormal behavior detection method suitable for the patrol vehicle according to claim 4, wherein the abnormal behavior detection network is preset in a cloud control platform, a vehicle-mounted controller and/or an intelligent terminal.

6. The utility model provides a pedestrian's unusual action detection device suitable for patrol and examine car which characterized in that, includes unusual action detection network for carry out online detection to the unusual action that exists in the video data of gathering in RGB-D camera and the surveillance camera, the acquisition device of unusual action detection network includes:

7. The abnormal behavior detection device for pedestrians suitable for patrolling vehicles according to claim 6, wherein the abnormal behavior detection network offline training module comprises:

8. The abnormal behavior detection device for pedestrians suitable for use in a patrol vehicle as claimed in claim 7, wherein the abnormal behavior detection network offline training module further comprises:

9. The method for detecting the abnormal behavior of the pedestrian, which is suitable for the patrol vehicle, according to claim 8, wherein the abnormal behavior detection network is preset in a cloud control platform, a vehicle-mounted controller and/or an intelligent terminal.

10. The device for detecting the abnormal behavior of the pedestrian according to claim 9, wherein the intelligent terminal is a cruise machine car, and comprises an RGB-D camera in signal connection with the abnormal behavior detection network.