CN114863352B

CN114863352B - Personnel group behavior monitoring method based on video analysis

Info

Publication number: CN114863352B
Application number: CN202210793997.5A
Authority: CN
Inventors: 刘驰; 范赐恩; 胡新礼; 李文航; 李露; 李继恒; 汪磊
Original assignee: Optical Valley Technology Co ltd
Current assignee: Optical Valley Technology Co ltd
Priority date: 2022-07-07
Filing date: 2022-07-07
Publication date: 2022-09-30
Anticipated expiration: 2042-07-07
Also published as: CN114863352A

Abstract

The invention relates to the technical field of data processing, in particular to a personnel group behavior monitoring method based on video analysis. The method comprises the following steps: acquiring a corresponding area image in a target place area; acquiring each key point and corresponding feature vector corresponding to each target person in the area image; obtaining a skeleton map corresponding to each target person in the region image according to each key point corresponding to each target person in the region image; acquiring an interaction domain corresponding to each target person in a target site area at the current acquisition time; splicing the skeleton map corresponding to each target person and the skeleton maps of the interactive persons corresponding to the interactive domains corresponding to the target persons to obtain the interactive maps corresponding to the target persons; and obtaining the action behaviors of the target personnel in each interactive graph according to the feature vectors corresponding to all key points in the interactive graph corresponding to each target personnel and the trained action behavior recognition network. The invention improves the accuracy of identifying the behavior of the person.

Description

Personnel group behavior monitoring method based on video analysis

Technical Field

The invention relates to the technical field of data processing, in particular to a personnel group behavior monitoring method based on video analysis.

Background

The monitoring camera is used for observing in a place with dense personnel, is a common public safety prevention and control means, and can find emergent emergencies in time, so that countermeasures for dealing with the relevant emergencies are made, and personnel are evacuated; the group behavior monitoring is a hot direction, the group behavior refers to the behavior of interaction generated by a plurality of persons, and is obviously different from the single behavior; the existing group behavior identification method is characterized in that the type of the important identification is trampling events, conflict events and other events which endanger the life safety of the masses, and the existing group behavior identification method is that monitoring professionals watch a monitoring screen in shifts to ensure real-time monitoring, and then group behaviors are analyzed and countermeasures are made manually.

In a monitoring picture, personnel are dense, the mutual shielding condition is easy to occur, and group actions have the characteristic of high complexity, so the identification difficulty is high; in addition, because the frequency of normal behaviors is high and the frequency of abnormal behaviors is low, the manual monitoring mode is easy to overlook, and the problems of strong subjectivity, easy fatigue and the like exist, so that the accuracy of identifying the behaviors of the personnel is relatively low.

Disclosure of Invention

In order to solve the problem that the accuracy of identifying the behaviors of the people based on a manual mode is relatively low in the prior art, the invention aims to provide a method for monitoring the behaviors of a group of people based on video analysis, and the adopted technical scheme is as follows:

the invention provides a personnel group behavior monitoring method based on video analysis, which comprises the following steps:

acquiring a region image corresponding to the current acquisition time in a target place region;

acquiring each key point corresponding to each target person in the area image and a feature vector corresponding to each key point; obtaining a skeleton map corresponding to each target person in the region image according to each key point corresponding to each target person in the region image; the feature vector is obtained by splicing a position vector corresponding to the key point and a body part vector;

acquiring an interaction domain corresponding to each target person in a target site area at the current acquisition time, wherein the interaction domain is an area within a preset range of the target person;

splicing the skeleton map corresponding to each target person and the skeleton maps of the interactive persons corresponding to the interactive domains corresponding to the target persons to obtain the interactive maps corresponding to the target persons;

and obtaining the action behaviors of the target personnel in each interactive graph according to the feature vectors corresponding to all key points in the interactive graph corresponding to each target personnel and the trained action behavior recognition network.

Preferably, the acquiring of each key point corresponding to each target person in the area image and the feature vector corresponding to each key point includes:

processing the area image by using a key point detection network to obtain each key point corresponding to each target person in the area image;

taking the vertex of the lower left corner of the area image as a coordinate origin, taking the horizontal direction as an x axis, and taking the vertical direction as a y axis, and obtaining position vectors corresponding to the key points corresponding to the target persons in the area image; the position vector comprises an abscissa corresponding to the key point, an ordinate corresponding to the key point and a depth value corresponding to the key point, and the depth value is obtained according to the area image;

for any target person: performing One-Hot coding on each key point corresponding to the target person to obtain body part vectors corresponding to each key point corresponding to the target person; and splicing the position vector corresponding to each key point corresponding to the target person with the body part vector to obtain the feature vector corresponding to each key point corresponding to the target person.

Preferably, the key points corresponding to the target persons in the region image are connected according to a preset connection rule to obtain a skeleton map corresponding to the target persons.

Preferably, the obtaining of the interaction domain corresponding to each target person in the target site area at the current acquisition time includes:

for any target person:

calculating to obtain a normal vector corresponding to the target person according to position vectors corresponding to the nose key point, the right eye key point and the left eye key point in the key points corresponding to the target person; the normal vector corresponding to the target person is as follows: calculating a plane normal vector constructed in the three-dimensional space by the nose key point, the right eye key point and the left eye key point by taking position vectors corresponding to the nose key point, the right eye key point and the left eye key point as coordinates in the three-dimensional space, and taking the plane normal vector as a normal vector corresponding to the target person;

taking a central point corresponding to the target person as a circular point in the target place area, and taking a first preset length as a radius to make a circle, so as to obtain a circular area corresponding to the target person; the central point corresponding to the target person is a nose key point corresponding to the target person;

taking a central point corresponding to the target person as a dot point, a normal vector corresponding to the target person as a central line, a preset angle as a fan-shaped angle and a second preset length as a radius in the target place area to obtain a fan-shaped area corresponding to the target person;

and taking the union region of the circular region and the fan-shaped region as an interaction region corresponding to the target person.

Preferably, the obtaining of the interaction map corresponding to each target person includes:

for any target person:

counting the number of other target persons not including the target person in the interaction domain corresponding to the target person, and recording as the interaction number;

if the number of the interactions corresponding to the target person is not 0, marking other target persons existing in the interaction domain corresponding to the target person as interaction persons; for any interactive person corresponding to the interactive domain corresponding to the target person: splicing the skeleton map corresponding to the target person and the skeleton map corresponding to the interactive person according to the position vector of each key point corresponding to the target person and the position vector of each key point corresponding to the interactive person to obtain an interactive map corresponding to the target person;

if the number of the interactions corresponding to the target person is 0, constructing an occupancy skeleton diagram; the space-occupying skeleton map is a skeleton map of a virtual interactive person, and position vectors of all key points in the space-occupying skeleton map are (-1, -1, -1); taking the virtual interaction personnel as interaction personnel corresponding to the interaction domain corresponding to the target personnel; connecting a left-hand key point corresponding to the target person with a left-hand key point of an interactive person corresponding to the interactive domain, connecting a right-hand key point corresponding to the target person with a right-hand key point of the interactive person corresponding to the interactive domain, and splicing the skeleton diagram corresponding to the target person with the occupancy skeleton diagram to obtain an interactive diagram corresponding to the target person; the right-hand key point and the left-hand key point are two key points in the key points.

Preferably, obtaining the interaction map corresponding to the target person includes:

connecting a key point which is closest to the right-hand key point corresponding to the target person in the key points corresponding to the interactive persons with the right-hand key point of the target person; connecting the key point which is closest to the left-hand key point corresponding to the target person in the key points corresponding to the interactive person with the left-hand key point of the target person; connecting a key point which is closest to the right-hand key point corresponding to the interactive person in the key points corresponding to the target person with the right-hand key point of the interactive person; connecting a key point which is closest to the left-hand key point corresponding to the interactive person in the key points corresponding to the target person with the left-hand key point of the interactive person; splicing the skeleton map corresponding to the target person and the skeleton map corresponding to the interactive person to obtain an interactive map corresponding to the target person;

the distance is calculated according to the position vector.

Preferably, the obtaining of the action behavior of the target person in each interactive graph includes:

for any interaction graph corresponding to any target person:

sequencing all key points in the interactive graph according to a preset sequence;

calculating the weight between each key point in the interactive map and each key point in the corresponding neighborhood set according to the feature vector corresponding to each key point in the interactive map and the normal vector corresponding to the target person; the corresponding neighborhood set comprises all key points connected with the corresponding key points in the interaction graph;

performing preset times of aggregation operation on each key point in the interactive graph according to the weight between each key point in the interactive graph and each key point in the corresponding neighborhood set to obtain a target feature vector corresponding to each key point in the interactive graph;

splicing target characteristic vectors corresponding to the key points in the interactive map according to the arrangement sequence of the key points to obtain comprehensive vectors corresponding to the interactive map;

and inputting the comprehensive vector corresponding to the interaction diagram into a multilayer perceptron to obtain the action behavior of the target person in the interaction diagram.

Preferably, the formula for calculating the weight between each key point in the interactive graph and each key point in the corresponding neighborhood set is as follows:

wherein, the first and the second end of the pipe are connected with each other,

for the weight of the jth keypoint to the ith keypoint in the interactive graph,

the feature vector corresponding to the ith key point in the interactive map,

the feature vector corresponding to the jth key point in the interaction graph,

is the normal vector corresponding to the target person,

the feature vector of the central point corresponding to the target person,

in order to perform the vector splicing operation,

in order to activate the function(s),

is a first one of the vectors of similarity, and,

is a second similarity degree vector, and is,

is the transpose of the first similarity vector,

is a transpose of the second similarity vector,

is a first weight matrix of the weight data set,

is a second weight matrix of the plurality of weight matrices,

is an exponential function with e as the base,

and (4) a neighborhood set corresponding to the ith key point in the interactive graph.

Preferably, the obtaining of the target feature vector corresponding to each key point in the interactive graph includes:

the formula for performing the first aggregation operation on each key point in the interactive graph is as follows:

wherein the content of the first and second substances,

for the aggregated updated feature vector corresponding to the ith key point in the interaction graph,

for the ith key point in the interaction graphThe number of interactions of the person corresponding to the skeletal map,

the interaction number of the person corresponding to the skeleton diagram to which the jth key point belongs in the interaction diagram is obtained; the personnel comprise the target personnel and the interactive personnel; the interaction quantity is the quantity of other target persons which do not comprise the target person and exist in the interaction domain corresponding to the target person;

and by analogy, continuously executing the aggregation operation by using the aggregated and updated feature vectors corresponding to the key points in the interactive graph, and recording the aggregated and updated feature vectors corresponding to the key points after the last aggregation as target feature vectors.

The invention has the following beneficial effects:

firstly, acquiring a region image corresponding to the current acquisition time in a region of a target place, key points corresponding to target people in the region image and feature vectors corresponding to the key points, and then acquiring a skeleton map corresponding to the target people in the region image according to the key points corresponding to the target people in the region image; then acquiring the interaction domain corresponding to each target person in the target place area at the current acquisition time, and splicing the skeleton map corresponding to each target person with the skeleton maps of the interaction persons corresponding to the interaction domains corresponding to each target person to obtain the interaction maps corresponding to each target person; and finally, according to the feature vectors corresponding to all key points in the interactive graphs corresponding to the target personnel and the trained action behavior recognition network, obtaining the action behaviors of the target personnel in the interactive graphs. The invention identifies the action behaviors of each person in monitoring in an automatic mode, overcomes the problems of strong subjectivity, easy fatigue and the like in a manual monitoring mode, and improves the accuracy of identifying the behaviors of the persons.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow chart of a method for monitoring group behavior of people based on video analysis according to the present invention;

fig. 2 is a skeleton diagram corresponding to the target person.

Detailed Description

In order to further explain the technical means and functional effects of the present invention adopted to achieve the predetermined invention purpose, the following describes in detail a method for monitoring group behaviors of people based on video analysis according to the present invention with reference to the accompanying drawings and preferred embodiments.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following specifically describes a specific scheme of the personnel group behavior monitoring method based on video analysis, which is provided by the invention, with reference to the accompanying drawings.

The embodiment of the personnel group behavior monitoring method based on video analysis comprises the following steps:

as shown in fig. 1, the method for monitoring the behavior of the group of people based on video analysis of the present embodiment includes the following steps:

and step S1, acquiring the area image corresponding to the current acquisition time in the target place area.

When the treading event, the collision event and other events endangering the life safety of the masses occur, corresponding countermeasures should be made in time to evacuate people so as to ensure the life safety of the masses; therefore, it is very important to monitor and identify the behaviors of each person in the group, identify abnormal behaviors in time, and make a corresponding plan. The embodiment provides a personnel group behavior monitoring method based on video analysis, which is used for monitoring people in a set place and identifying the action behaviors of the people at different moments.

In the embodiment, the public places of indoor general people when disasters occur are taken as examples (namely, places which are easy to cause people confusion), and monitoring group behaviors in the places has important significance for guaranteeing the safety of the people.

In this embodiment, a 3D simulator is established by using the universal Engine 5, and a simulation experiment is performed, so that the corresponding situation of the real environment of the final landing is as follows: the method comprises the steps of collecting the sizes of important ground objects such as building internal structures, building internal facilities, roads outside buildings, enclosing walls, green plants and the like in public places on site, and restoring a real scene by 1: 1; collecting relevant maps of real public places and baking the maps to ensure that the material maps of the environmental model are highly vivid; making sky in different time (morning, noon and evening) and different seasons (spring, summer, autumn and winter) and corresponding environment model maps to realize the switching of environment models in different time and different seasons; and (4) making environment models with three different fineness grades, namely low, medium and high, and making corresponding LOD models for optimal scheduling during virtual VR roaming and virtual training.

And modeling the 3D avatar models of 20 typical man and woman common masses and the avatar of the manager, wherein the avatar conforms to the common manager avatar and the common mass avatar. The present embodiment thus far performs scene modeling.

Setting personnel behaviors, common people can randomly execute the following behaviors, including: collision, talking, hugging, walking, jogging, sitting up, sprinting, jumping, squatting, stooping, climbing, creeping, falling down, injury, etc.

Setting an emergency disaster event, wherein an explosion accident caused by the sparks generated in the production process is assumed to be caused by flour production or other inflammable and explosive dust released into the air in a factory; and manufacturing a light and shadow effect, a dust effect and the like when an emergency disaster event occurs through a template in the non-real Engine 5, and taking the light and shadow effect, the dust effect and the like as an identification interference item of a monitoring camera, so that the robustness of a subsequent network is improved.

The modeling process can be completed by a 3D simulator designer and can be implemented by using known techniques, and the detailed description of the process is omitted.

The present embodiment corresponds the monitored condition of the real site to the position and the angle of view in the simulator to capture the picture in the simulator with the frame number of 30 FPS. The input of the monitoring data preprocessing module is the video shot by the monitoring visual angle and the environmental parameters of the 3D simulator. The present embodiment represents a simulated public place as a target place area.

In this embodiment, an image in the target location area is acquired at intervals and recorded as an area image, and then the action behaviors of each person in the target location area at the current acquisition time are identified by using the area image (that is, an area image is acquired at intervals of a preset time period, which is set as required). In this embodiment, the camera for monitoring is a depth camera, and depth information of each pixel point in the image can be acquired.

The method includes the steps that an area image corresponding to the current acquisition time in a target place area is obtained, and then the action behaviors of each person in the area image are recognized by taking the area image as an example; the present embodiment takes the general public in the area image as the target person.

Step S2, acquiring each key point corresponding to each target person in the area image and a feature vector corresponding to each key point; obtaining a skeleton map corresponding to each target person in the region image according to each key point corresponding to each target person in the region image; the feature vector is obtained by splicing a position vector corresponding to the key point and a body part vector.

In order to identify the action behavior of the target person in the image, in this embodiment, first, the key point detection network is used to process the area image to obtain each key point corresponding to each target person in the area image, and for each key point corresponding to any target person, as shown in fig. 2, where 1 is a nose key point

2 is the key point of the neck

And 3 is the right shoulderKey points

And 4 is the key point of the right elbow

5 is the key point of the right hand

6 is the key point of the left shoulder

7 is the key point of the left elbow

8 is the key point of the left hand

And 9 is the key point of right crotch

10 is the key point of the right knee

And 11 is the key point of the right foot

12 is the key point of the left crotch

13 is the key point of the left knee

14 is the key point of the left foot

And 15 is the key point of the right eye

And 16 is the key point of the left eye

17 is the key point of the right ear

18 is the key point of the left ear

19 is the center point of the hand held object

19 key points (namely, one target person corresponds to 19 key points), wherein the sequence is the arrangement sequence of the key points; in this embodiment, the key point detection network is an openpos key point detection model. In this embodiment, the key point detection network and the training process thereof are prior art, and will not be described herein again.

Thus, the embodiment obtains each key point corresponding to each target person in the area image.

In the embodiment, a rectangular coordinate system is constructed by taking the vertex of the lower left corner of the regional image as the origin of coordinates, taking the horizontal direction as the x axis and taking the vertical direction as the y axis; acquiring position vectors corresponding to the key points corresponding to the target persons in the region image according to the rectangular coordinate system; the position vector corresponding to any key point corresponding to any target person is

Wherein

The abscissa of the kth key point corresponding to the target person,

is the ordinate of the kth key point corresponding to the target person,

correspond to the target personThe depth value of the kth keypoint of (1); the depth value is obtained from the area image.

For any target person: performing One-Hot coding on each key point corresponding to the target person to obtain body part vector corresponding to each key point corresponding to the target person (i.e. obtaining body part vector corresponding to each key point according to the code corresponding to each key point), such as nose

The position code of (1,0,0, …, 0); the body part vector can reflect which body part the corresponding key point belongs to, so that the key point connection can be carried out subsequently. The One-Hot coding in this embodiment is the prior art, and is not described herein again.

For the occluded key points, the corresponding position vectors are set as

Its corresponding body part vector is unchanged. In this embodiment, the position vector and the body part vector corresponding to each key point are spliced to obtain the feature vector corresponding to each key point. Thus, in this embodiment, according to the above process, the feature vector corresponding to each key point corresponding to each target person in the area image is obtained.

According to each key point corresponding to each target person, constructing a skeleton diagram corresponding to each target person, and for any target person: in this embodiment, the key points corresponding to the target person are connected according to a preset connection rule (i.e., the key points are connected according to positions of different parts of the human body), that is, the preset connection rule is as follows:

，

，

，

，

，

，

，

，

，

，

，

，

，

，

，

，

，

，

(ii) a Connecting the key points corresponding to the target person to obtain a skeleton diagram corresponding to the target person, as shown in fig. 2; each key point in the skeleton map corresponds to a feature vector.

Thus, in this embodiment, a skeleton map corresponding to each target person is obtained according to the above process (the skeleton map includes each key point corresponding to the target person). In this embodiment, the nose key point corresponding to the target person is used as the central point of the corresponding skeleton map (i.e., the central point corresponding to the target person), so that the simulation environment and the real environment can be associated to determine the position of the target person in the target place area, thereby achieving the same recognition effect.

And step S3, obtaining the interaction domain corresponding to each target person in the target place area at the current acquisition time, wherein the interaction domain is an area within a preset range of the target person.

Considering that the distance between the target person and the person who interacts with the target person is not too far, the embodiment obtains the interaction domain corresponding to each target person in the target place area at the acquisition time, where the interaction domain is an area within a preset range of the target person (that is, the target person can interact with the person within the preset range); the process of acquiring the interaction domain corresponding to any target person in this embodiment specifically includes:

firstly, acquiring a position vector corresponding to a nose key point corresponding to the target person, a position vector corresponding to a right eye key point and a position vector corresponding to a left eye key point; in this embodiment, the position vector is used as a coordinate in a three-dimensional space, that is, the positions of the nose key point, the right eye key point and the left eye key point in the three-dimensional space are determined according to the position vector corresponding to the nose key point, the position vector corresponding to the right eye key point and the position vector corresponding to the left eye key point; because three points can uniquely determine a plane (that is, a plane can be determined by the nose key point, the right eye key point and the left eye key point in a three-dimensional space), in this embodiment, a normal vector of the plane is calculated according to a position vector corresponding to the nose key point corresponding to the target person, a position vector corresponding to the right eye key point and a position vector corresponding to the left eye key point, and a modulus of the normal vector is set to be 1 (that is, a normal vector corresponding to the target person); the present embodiment utilizes the normal vector to simulate the direction of the eye sight of the target person. The process of calculating the algorithm vector in this embodiment is the prior art, and will not be described herein again.

Then, in the embodiment, a central point (namely, a nose key point) corresponding to the target person is taken as a circular point, and a first preset length is taken as a radius to make a circle, so that a circular area corresponding to the target person is obtained; similarly, taking the central point corresponding to the target person as a dot, taking the normal vector corresponding to the target person as a central line, taking a preset angle as a fan-shaped angle, and taking a second preset length as a radius to be a fan-shaped area corresponding to the target person; taking the union region of the circular region and the fan-shaped region as an interaction region corresponding to the target person; considering that the interaction is generally performed by selecting a person closer or looking at the interaction, the visual field of the person is generally about 120 degrees, and therefore the preset angle should not be too different from 120 degrees. In this embodiment, values of the first preset length, the second preset length, and the preset angle are set according to actual needs.

So far, the embodiment obtains the interaction domain corresponding to each target person according to the above process.

And step S4, splicing the skeleton map corresponding to each target person and the skeleton maps of the interactive persons corresponding to the interactive domains corresponding to the target persons to obtain the interactive maps corresponding to the target persons.

For any interaction domain corresponding to the target person:

and counting the number of other target persons which do not include the target person and exist in the interaction domain corresponding to the target person, and recording the number as the interaction number.

If the number of interactions corresponding to the target person is not 0, namely other target persons exist in the interaction domain corresponding to the target person, marking other target persons existing in the interaction domain corresponding to the target person as interaction persons, namely interaction persons corresponding to the interaction domain corresponding to the target person; the number of the corresponding interactive personnel can be more than one and can also be 1. In this embodiment, according to the position vector of each key point corresponding to the target person and the position vector of each key point of each interactive person corresponding to the interactive domain corresponding to the target person, the skeleton map corresponding to the target person is respectively spliced with the skeleton maps corresponding to the interactive persons, so as to obtain the interactive map corresponding to the target person (if there are a plurality of interactive persons corresponding to the target person, there are a plurality of corresponding interactive maps; if there are 1 interactive person corresponding to the target person, there are only 1 corresponding interactive map); for any interactive person corresponding to the interactive domain corresponding to the target person:

firstly, the implementation connects the right-hand key point corresponding to the target person with the interactive person; respectively calculating the distance between the right-hand key point corresponding to the target person and each key point corresponding to the interactive person according to the position vector of the right-hand key point corresponding to the target person and the position vector of each key point corresponding to the interactive person, wherein the specific formula is as follows:

the distance between the right-hand key point corresponding to the target person and the kth key point corresponding to the interactive person,

the abscissa of the right-hand key point corresponding to the target person,

is the ordinate of the right-hand key point corresponding to the target person,

the depth value of the right-hand key point corresponding to the target person,

the abscissa of the kth key point corresponding to the interactive person,

the ordinate of the kth key point corresponding to the interactive person,

the depth value of the k-th key point corresponding to the interactive person.

Therefore, the distance between the right-hand key point corresponding to the target person and each key point corresponding to the interactive person is obtained in the embodiment; and selecting the key point which is closest to the right-hand key point corresponding to the target person from the key points corresponding to the interactive person, and connecting the key point with the right-hand key point corresponding to the target person.

Similarly, connecting the key point which is closest to the left-hand key point corresponding to the target person in the key points corresponding to the interactive person with the left-hand key point of the target person; connecting the key point which is closest to the right-hand key point corresponding to the interactive person in the key points corresponding to the target person with the right-hand key point of the interactive person; connecting the key point which is closest to the left-hand key point corresponding to the interactive person in the key points corresponding to the target person with the left-hand key point of the interactive person; and further splicing the skeleton graph corresponding to the target person and the skeleton graph of the interactive person to obtain an interactive graph corresponding to the target person. For an interactive graph, the interactive graph comprises key points corresponding to two persons and feature vectors corresponding to the key points.

If the number of interactions corresponding to the target person is 0, that is, no interactive person exists in the interaction domain corresponding to the target person (that is, the target person does not have a corresponding interactive person), the embodiment constructs an occupancy skeleton map (that is, constructs a virtual interactive person), and takes the occupancy skeleton map as a skeleton map of the virtual interactive person; taking the virtual interactive personnel as the interactive personnel corresponding to the interactive domain corresponding to the target personnel; the position vectors of all key points in the space occupying skeleton map are (-1, -1, -1), and the body part vectors are unchanged. In this embodiment, the left-hand key point corresponding to the target person is connected with the left-hand key point of the interactive person corresponding to the interactive domain, and the right-hand key point corresponding to the target person is connected with the right-hand key point of the interactive person corresponding to the interactive domain, so as to obtain the interactive map corresponding to the target person; at this time, there is only one interactive map, and the number of interactions corresponding to the corresponding interactive person (i.e., the virtual interactive person) is 0.

And step S5, obtaining the action behaviors of the target person in each interactive graph according to the feature vectors corresponding to all the key points in the interactive graph corresponding to each target person and the trained action behavior recognition network.

In this embodiment, an interaction graph corresponding to each target person is obtained according to step S4 (the interaction graph corresponding to the target person may be one or multiple, and specifically relates to the number of the interactive persons). All key points in the interaction graph are sequenced, specifically: the interaction graph comprises key points of a target person and key points of an interaction person, and the key points corresponding to one person are ordered, in this embodiment, the key points of the interaction person are ordered next to the key points of the target person, that is, the key points corresponding to the target person are arranged to 19, and the key points of the interaction person are ordered from 20.

In order to identify the action behaviors of each target person in the area image, an action behavior identification network is constructed in the embodiment; in this embodiment, the interaction graph corresponding to each target person is input into the trained action behavior recognition network, and is output as the action behavior of each target person in the corresponding interaction graph. The action behavior recognition network comprises an attention mechanism and a multi-layer perceptron.

The action behavior recognition network firstly utilizes an attention mechanism to carry out aggregation processing on the feature vectors of the key points corresponding to the input interactive graphs, and after aggregation is completed, the target feature vectors of the key points corresponding to the input interactive graphs are obtained.

For any interaction graph corresponding to any input target person:

in this embodiment, according to the feature vector corresponding to each key point in the interactive map and the normal vector corresponding to the target person, the weight between each key point in the interactive map and each key point in the corresponding neighborhood set is calculated; the neighborhood set corresponding to the key point is a set formed by all key points connected with the corresponding key point in the interactive graph, namely:

wherein the content of the first and second substances,

for the weight between the ith and jth keypoints in the interaction graph (i.e. the weight of the jth keypoint to the ith keypoint),

the feature vector corresponding to the ith key point in the interactive map,

the feature vector corresponding to the jth key point in the interactive graph,

is the normal vector corresponding to the target person,

the feature vector of the central point corresponding to the target person (i.e. the feature vector corresponding to the nose key point corresponding to the target person),

in order to perform the vector splicing operation,

in order to activate the function(s),

is the first similarity vector and the second similarity vector,

is a second similarity degree vector, and is,

is the transpose of the first similarity vector,

is a transpose of the second similarity vector,

is a first weight matrix of the weight data set,

is a second weight matrix of the plurality of weight matrices,

is an exponential function with e as the base,

and a neighborhood set corresponding to the ith key point in the interaction graph.

According to the above-mentioned formula,

is to measure two vectors

And

of (2), wherein

Of dimension and

are the same in dimension; in the same way

Is to measure two vectors

And

of (2), wherein

Dimension of and

are the same. The first weight matrix and the second weight matrix in this embodiment are used for performing linear transformation to implement dimension reduction, that is, vectors are subjected to different linear transformations to implement dimension reduction processing; the values of the first similarity vector, the second similarity vector, the first weight matrix and the second weight matrix can be obtained by training of a neural network, and are not repeated herein.

According to the above process, the weight between each key point in the interactive graph and each key point in the corresponding neighborhood set can be obtained.

Further, performing aggregation operation on each key point in the interactive graph according to the weight between each key point in the interactive graph and each key point in the corresponding neighborhood set to obtain an aggregated and updated feature vector corresponding to each key point corresponding to the interactive graph; for the first aggregation operation on each key point in the interactive graph, the formula for obtaining the aggregated updated feature vector corresponding to each key point is as follows:

for the aggregated updated feature vector corresponding to the ith key point in the interactive graph,

the number of interaction of the person corresponding to the skeleton diagram to which the ith key point belongs in the interaction diagram,

the number of interactions for the person corresponding to the skeleton diagram to which the jth key point belongs in the interaction diagram (that is, the ith key point and the jth key point may belong to the skeleton diagram corresponding to the target person or may belong to the skeleton diagram corresponding to the interaction person, and the persons include the target person and the interaction person).

The function of (1) is to measure the degree of visual attention when

The larger the number of people around the screen, the larger the number of people who touch the screen, the more likely the action is due to congestion and the more unintentional the action, and therefore the size of the weight is appropriately reduced.

After each key point in the interactive graph is updated once, repeating the steps, and performing second aggregation operation by using the aggregated updated feature vector corresponding to each key point so as to ensure that each key point fully senses neighborhood information and improve the identification accuracy, wherein the specific aggregation times can be set according to actual needs; and when the feature vectors corresponding to the key points in the interactive graph are updated through all aggregation, obtaining target feature vectors corresponding to the key points in the interactive graph (namely, the feature vectors updated through aggregation corresponding to the key points after the last aggregation are recorded as the target feature vectors).

In this embodiment, target feature vectors corresponding to the key points in the interactive map are spliced according to the arrangement sequence of the key points, so as to obtain a comprehensive vector corresponding to the interactive map; then, the comprehensive vector corresponding to the interaction diagram is input into a multi-layer perceptron MLP, and a recognition result of the action behavior of the target person in the interaction diagram is obtained (the action behavior of the target person in the interaction diagram is relative to the interaction person). The behavior actions comprise daily behaviors: walking, sitting up, talking, queuing, jogging and stooping; the non-daily behaviors are as follows: conflict, hug, crowd, climb, crawl, fall, injury, jump, squat, sprint, others; the specific action behaviors can be set according to actual needs.

In the embodiment, the loss function of the training action and behavior recognition network is a cross entropy loss function, and the RMSProp is used for an optimization algorithm; the specific training method is the prior art, and is not described herein again.

Thus, the embodiment obtains the action behaviors of each target person in the corresponding interaction graph in the area image so as to reflect the group behaviors in the target place area at the current acquisition time; if the emergency event is judged to occur through the action behavior of each target person, the management person needs to go to a specific position to control through a disordered scene; in order to enable the manager to reach the designated location more quickly, the route is planned in the embodiment, specifically:

according to the action behavior recognition result, marking the position where the emergency event occurs as a blocked area, and further obtaining an emergency event distribution map of the target place area; in the embodiment, the blocked area in the emergency distribution map is regarded as an impassable area and needs to be bypassed, so that the ant colony algorithm is used for processing the emergency distribution map and planning an optimal path to the blocked area; the ant colony algorithm has the advantages of a positive feedback mechanism, distributed computation, strong robustness and the like, the candidate solution construction process is similar to the path planning process, and the shortest foraging path can be found without prior knowledge; the ant colony algorithm is the prior art, and is not described herein again.

The method includes the steps that firstly, an area image corresponding to the current acquisition time in an area of a target place, key points corresponding to target people in the area image and feature vectors corresponding to the key points are obtained, and then a skeleton map corresponding to the target people in the area image is obtained according to the key points corresponding to the target people in the area image; then acquiring the interaction domain corresponding to each target person in the target place area at the current acquisition time, and splicing the skeleton map corresponding to each target person with the skeleton maps of the interaction persons corresponding to the interaction domains corresponding to each target person to obtain the interaction maps corresponding to each target person; and finally, according to the feature vectors corresponding to all key points in the interactive graphs corresponding to the target personnel and the trained action behavior recognition network, obtaining the action behaviors of the target personnel in the interactive graphs. The embodiment identifies the action behaviors of each person in the monitoring in an automatic mode, overcomes the problems of strong subjectivity, easy fatigue and the like in a manual monitoring mode, and improves the accuracy of identifying the behaviors of the persons.

It should be noted that: the above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A personnel group behavior monitoring method based on video analysis is characterized by comprising the following steps:

acquiring each key point corresponding to each target person in the area image and a feature vector corresponding to each key point; obtaining a skeleton map corresponding to each target person in the region image according to each key point corresponding to each target person in the region image; the characteristic vector is obtained by splicing a position vector corresponding to the key point and a body part vector;

splicing the skeleton map corresponding to each target person and the skeleton maps of the interactive persons corresponding to the interactive domains corresponding to the target persons to obtain interactive maps corresponding to the target persons;

according to the feature vectors corresponding to all key points in the interactive graphs corresponding to the target personnel and the trained action behavior recognition network, obtaining the action behaviors of the target personnel in the interactive graphs;

the obtaining of the action behaviors of the target person in each interactive graph includes:

for any interaction graph corresponding to any target person:

calculating the weight between each key point in the interactive map and each key point in the corresponding neighborhood set according to the feature vector corresponding to each key point in the interactive map and the normal vector corresponding to the target person; the corresponding neighborhood set comprises all key points connected with the corresponding key points in the interactive graph;

inputting the comprehensive vector corresponding to the interactive map into a multilayer perceptron to obtain the action behavior of the target person in the interactive map;

the formula for calculating the weight between each key point in the interactive graph and each key point in the corresponding neighborhood set is as follows:

for the weight of the jth key point to the ith key point in the interactive graph，

The feature vector corresponding to the ith key point in the interaction graph,

the feature vector corresponding to the jth key point in the interaction graph,

is the normal vector corresponding to the target person,

the feature vector of the central point corresponding to the target person,

in order to perform the vector splicing operation,

in order to activate the function(s),

is the first similarity vector and the second similarity vector,

is the second similarity vector, and is the second similarity vector,

is a transpose of the first similarity vector,

is a transpose of the second similarity vector,

is a first weight matrix of the weight data set,

is a second weight matrix that is a function of,

is an exponential function with e as the base,

2. The method for monitoring the group behaviors of people based on video analysis according to claim 1, wherein the obtaining of the key points corresponding to the target people in the area image and the feature vectors corresponding to the key points comprises:

taking the vertex of the lower left corner of the area image as a coordinate origin, taking the horizontal direction as an x axis and taking the vertical direction as a y axis, and obtaining a position vector corresponding to each key point corresponding to each target person in the area image; the position vector comprises an abscissa corresponding to the key point, an ordinate corresponding to the key point and a depth value corresponding to the key point, and the depth value is obtained according to the area image;

for any target person: performing One-Hot coding on each key point corresponding to the target person to obtain body part vectors corresponding to the key points corresponding to the target person; and splicing the position vector and the body part vector corresponding to each key point corresponding to the target person to obtain the feature vector corresponding to each key point corresponding to the target person.

3. The method for monitoring the group behaviors of people based on video analysis according to claim 1, wherein the key points corresponding to the target people in the region image are connected according to a preset connection rule to obtain a skeleton map corresponding to the target people.

4. The method for monitoring the group behaviors of people based on video analysis according to claim 1, wherein the step of obtaining the interaction domain corresponding to each target person in the target site area at the current acquisition time comprises the following steps:

for any target person:

calculating to obtain a normal vector corresponding to the target person according to position vectors corresponding to the nose key point, the right-eye key point and the left-eye key point in the key points corresponding to the target person; the normal vector corresponding to the target person is as follows: calculating a plane normal vector constructed in the three-dimensional space by the nose key point, the right eye key point and the left eye key point by taking position vectors corresponding to the nose key point, the right eye key point and the left eye key point as coordinates in the three-dimensional space, and taking the plane normal vector as a normal vector corresponding to the target person;

5. The method for monitoring the group behaviors of people based on video analysis according to claim 1, wherein the obtaining of the interaction graph corresponding to each target person comprises:

for any target person:

6. The method for monitoring the group behaviors of people based on video analysis according to claim 5, wherein obtaining the interaction graph corresponding to the target person comprises:

connecting a key point which is closest to the right-hand key point corresponding to the target person in the key points corresponding to the interactive persons with the right-hand key point of the target person; connecting a key point which is closest to the left-hand key point corresponding to the target person in the key points corresponding to the interactive persons with the left-hand key point of the target person; connecting a key point which is closest to the right-hand key point corresponding to the interactive person in the key points corresponding to the target person with the right-hand key point of the interactive person; connecting a key point which is closest to the left-hand key point corresponding to the interactive person in the key points corresponding to the target person with the left-hand key point of the interactive person; splicing the skeleton graph corresponding to the target person and the skeleton graph corresponding to the interactive person to obtain an interactive graph corresponding to the target person;

the distance is calculated according to the position vector.

7. The method for monitoring the behaviors of the people group based on the video analysis according to claim 1, wherein the obtaining of the target feature vector corresponding to each key point in the interactive map comprises:

wherein the content of the first and second substances,