CN115512293A

CN115512293A - Spatial behavior intelligent analysis method and system based on computer vision

Info

Publication number: CN115512293A
Application number: CN202211120406.4A
Authority: CN
Inventors: 许景科; 田立新; 罗娇娇; 潘聪; 刘亚臣
Original assignee: Shenyang Jianzhu University
Current assignee: Shenyang Jianzhu University
Priority date: 2022-09-15
Filing date: 2022-09-15
Publication date: 2022-12-23

Abstract

The invention discloses a space behavior intelligent analysis method and a system based on computer vision, wherein the method comprises the steps of training a YOLOv5-ArcFace model for face recognition and a YOLOv5-DeppSort-SlowFast model for human behavior detection and tracking; establishing a database of corresponding relations between the feature vectors and the expression feature vectors and the categories of the pedestrian; identifying feature vectors of faces, expressions and human behavior actions of pedestrians and corresponding categories of the feature vectors from a video frame by using a model for a current monitoring video; comparing the identified characteristic vector with the corresponding characteristic vector in the database, and judging whether the identification result is correct or not; inputting the correct feature vector into a softmax classifier to classify the potential risks; and judging whether clustering behaviors or other behaviors with potential risks exist or not based on the current monitoring video. The method and the system can effectively monitor the potential risk in the application environment and give an early warning in time, and improve the accuracy of prejudging the potential risk in the pedestrian.

Description

Spatial behavior intelligent analysis method and system based on computer vision

Technical Field

The invention relates to the technical field of computer vision, in particular to a space behavior intelligent analysis method and system based on computer vision.

Background

With the advancement of science and technology, informatization is developing more and more rapidly, and management under different environments is changed newly, for example, the management system plays an active role in prison and video monitoring. Currently, this work of pedestrian anomaly analysis is a check or supervision by the naked eye of an administrator. In actual operation, the method observed by naked eyes has great defects, and the prejudgment accuracy and the timely discovery rate are also low.

The methods used today have a number of disadvantages, mainly including: the potential risk monitoring of personnel's crowd, single or many people's action etc. aspect is through relying on the manpower to observe, can not accomplish to observe the analysis all the time. When the pre-judging ability of the observer is wrong, great waste is generated on subsequent manpower and material resources, and irreversible serious consequences can also be caused; meanwhile, the existing method does not have a potential risk quasi-execution standard, and corresponding measures cannot be taken aiming at specific risks; and some conditions, such as weather, light, etc., may affect the judgment ability of the observer under the influence of the external environment. Due to the fact that the special condition rarely occurs in the use environment of most video monitoring, the obtaining difficulty is high, the data volume is small, and the existing method cannot be applied to small data to construct an accurate space behavior intelligent analysis scheme.

Disclosure of Invention

Aiming at the defects in the prior art, the invention discloses a space behavior intelligent analysis method and system based on computer vision. The method and the device can effectively monitor the occurrence of the conditions such as the crowd, the potential behavior risk and the like of the personnel in the application environment, improve the accuracy of prejudging the potential risk of the application environment and send out early warning in advance.

In order to solve the technical problem, the invention provides a computer vision-based space behavior intelligent analysis method, which comprises the following steps of:

s1: training a YOLOv5-ArcFace model for face recognition and a YOLOv5-DeppSort-SlowFast model for human behavior detection and tracking based on a data set;

s2: establishing a database of the corresponding relation between the facial feature vector and the expression feature vector and the category to which the facial feature vector and the expression feature vector belong and a database of the corresponding relation between the human behavior and action feature vector and the category to which the human behavior and action feature vector belongs for the pedestrian;

s3: for a current monitoring video containing pedestrians, recognizing feature vectors of faces, expressions and human body behaviors of the pedestrians and the corresponding categories of the feature vectors from a video frame by using a trained YOLOv5-ArcFace model and a trained YOLOv5-DeppSort-SlowFast model, and automatically allocating IDs to the pedestrians through a Deepsort tracking module of the YOLOv5-DeppSort-SlowFast model for real-time tracking;

s4: comparing the recognized face characteristic vector, expression characteristic vector and human behavior and action characteristic vector of the pedestrian with the characteristic vectors of the same type recorded in the database, if the Euclidean distance between the two vectors is smaller than a set threshold value, indicating that the recognition result is correct, and entering the next step;

s5: inputting the classes of the face characteristic vector, the expression characteristic vector and the human behavior and action characteristic vector of the pedestrian into a softmax classifier, and classifying the potential risks;

s6: dividing a picture of a current monitoring video into a plurality of subspaces, detecting, tracking and identifying pedestrians in the picture, judging whether a clustering behavior exists or not according to an unknown spatial relationship between the pedestrians and the subspaces, and if so, sending a clustering early warning; and comprehensively judging whether other behaviors with potential risks exist according to the behavior and the action, the facial expression and the spatial information of the pedestrian in the picture, if so, sending out a corresponding potential risk behavior early warning, and marking out the type of the potential risk.

Further, the step S1 further includes:

acquiring a historical monitoring video of a pedestrian, wherein the historical monitoring video comprises the face, the expression and the human behavior of the pedestrian;

and making a training set based on video frames of the historical monitoring video.

Further, the step S2 includes:

s2-1: acquiring the past monitoring video of each pedestrian from the historical monitoring video, wherein the past monitoring video comprises the face, the expression and the human behavior of the pedestrian;

s2-2: extracting and obtaining a face feature vector and an expression feature vector of a pedestrian from a previous monitoring video through a YOLOv5-ArcFace model, marking the corresponding categories, and then storing to obtain a database;

s2-3: detecting and tracking pedestrians in a video frame of a previous monitoring video through a YOLOv5-Deepsort-SlowFast module in a YOLOv5-Deepsort-SlowFast model, transmitting an obtained result to a SlowFast behavior recognition module in the YOLOv5-Deepsort-SlowFast model to recognize human behavior, acquiring a human behavior characteristic vector, labeling a motion category to which the human behavior characteristic vector belongs, and storing the motion characteristic vector to obtain a database.

Further, the step S3 includes:

s3-1: acquiring a pixel matrix of a current video frame, detecting a human body area and a human face area by adopting a YOLOv5 network, and acquiring position coordinates of the human body area and the human face area;

s3-2: carrying out face recognition on the detected face region by using the trained YOLOv5-ArcFace model to obtain face feature vectors, the classes of the face feature vectors, expression feature vectors and the classes of the face feature vectors;

s3-3: the pedestrian detected by the YOLOv5 network is tracked by the trained YOLOv5-DeppSort-SlowFast model, the ID is automatically allocated to the tracked pedestrian, the frame sequence formed by detecting and tracking each frame of human body area is subjected to human body action recognition, and the human body action characteristic vector and the category of the human body action characteristic vector are obtained.

Further, the step S4 includes:

comparing the face characteristic vector of the pedestrian with the face characteristic vector recorded in the corresponding database, and if the Euclidean distance between the face characteristic vector and the face characteristic vector is smaller than a set face threshold value, judging that identity matching is successful; otherwise, storing the face feature vector of the pedestrian in a corresponding database;

comparing the expression characteristic vector of the pedestrian with the expression characteristic vector recorded in the corresponding database, and if the Euclidean distance between the expression characteristic vector and the expression characteristic vector is smaller than a set expression threshold value, judging that the recognition result is correct;

and comparing the human behavior and action characteristic vector of the pedestrian with the human behavior and action characteristic vector recorded in the corresponding database, and if the Euclidean distance between the human behavior and action characteristic vector is smaller than a set action threshold value, judging that the identification result is correct.

Further, the step S5 includes:

softmax implements the computational expression of the classifier as follows:

wherein L is ₁ Representing the score value of each category, N representing the number of training samples, W representing the weight vector of each category, b representing a bias term, x representing a feature vector, and y representing a category;

for the classification of the face feature vector and the expression feature vector of the pedestrian, a classifier obtained by improvement based on a softmax classifier is adopted, and the expression of the classifier is as follows:

wherein, L represents the score value of each category, N represents the number of training samples, theta represents the included angle between W and x, and m represents the additive angle margin.

Further, the step S6 includes:

dividing a picture of a current monitoring video into a plurality of subspaces with the same size, setting the maximum number of people allowed to be accommodated in each subspace region as a space threshold, detecting and tracking pedestrians through a detection and tracking module in a YOLOv5-deep sort-Slowfast model, counting the number of people in each subspace, comparing the number of people with the space threshold, and sending an abnormal early warning if the number of people exceeds the space threshold;

combining the classification result of the potential risks with the maximum number of people gathering the same subspace area and exceeding the subspace to allow to carry out comprehensive judgment, classifying the judgment result into a normal category and an abnormal category, and classifying the abnormal category into a low-grade abnormal category and a high-grade abnormal category;

for the display of the warning, the low-level abnormality is marked and displayed in the screen by a yellow warning, and the high-level abnormality is marked and displayed in the screen by a red warning.

The invention also discloses a space behavior intelligent analysis system based on computer vision, which comprises:

the recognition model training module is used for training a YOLOv5-ArcFace model for face recognition and a YOLOv5-DeppSort-SlowFast model for human behavior detection and tracking based on a data set;

the database module is used for establishing a database of the corresponding relation between the facial feature vector and the expression feature vector and the category to which the facial feature vector belongs and a database of the corresponding relation between the human behavior and action feature vector and the category to which the human behavior and action feature vector belongs for the pedestrian;

the detection module is used for identifying feature vectors of the face, the expression and the human behavior of the pedestrian and the corresponding categories of the feature vectors from a video frame by utilizing the trained YOLOv5-ArcFace model and the trained YOLOv5-DeppSort-SlowFast model for the current monitoring video containing the pedestrian, automatically distributing an ID (identity) to the pedestrian through a Deepsort tracking module of the YOLOv5-DeppSort-SlowFast model, and tracking the pedestrian in real time;

the identification module is used for comparing the identified face characteristic vector, expression characteristic vector and human behavior and action characteristic vector of the pedestrian with the characteristic vectors of the same type recorded in the database, and if the Euclidean distance between the face characteristic vector, the expression characteristic vector and the human behavior and action characteristic vector is smaller than a set threshold value, the identification result is correct;

the classifier module is used for inputting the classes of the face characteristic vector, the expression characteristic vector and the human behavior and action characteristic vector of the pedestrian into a softmax classifier and classifying the potential risks;

the risk early warning module is used for dividing the picture of the current monitoring video into a plurality of subspaces, detecting, tracking and identifying the pedestrians in the picture, judging whether a clustering behavior exists or not according to the unknown spatial relationship between the pedestrians and the subspaces, and sending out a clustering early warning if the clustering behavior exists; and comprehensively judging whether other behaviors with potential risks exist according to the behavior and the action, the facial expression and the spatial information of the pedestrian in the picture, if so, sending out a corresponding potential risk behavior early warning, and marking out the type of the potential risk.

Further, the system further comprises:

an information confirmation module: the method is used for information confirmation of login personnel and judging whether the user has the use authority.

Further, the system further comprises:

a storage module for storing the current monitoring video content and automatically storing the current monitoring video to the local according to the preset time and period

Compared with the prior art, the invention has the following advantages:

the invention adds an SE attention mechanism, the problem that the concentration degree of a target is possibly subjected to false detection in the original YOLOv5 detection module, and the occurrence of false detection and missing detection can be effectively avoided when the model processes the detection of complex backgrounds such as a plurality of articles, animals and the like in an image environment by adding the SE attention mechanism. The network can pay more attention to the target to be detected, and the detection effect is improved.

The YOLOv5 trunk feature extraction network adopts a C3 structure, larger parameter quantity is brought, the calculation amount is large, the application is limited, the network is difficult to be applied in actual application scenes such as the deployment of mobile or embedded equipment, especially, the application environment requires low delay or response is fast, in order to solve the problem of low detection speed caused by overlarge model parameter quantity, the YOLOv5 trunk feature extraction network is replaced by a lighter MobileNet 3 network, the quantity of model parameters is reduced, the calculation amount is greatly reduced, and the detection speed is improved under the condition of unchanged precision. Meanwhile, the running speed of the model on the CPU is greatly improved, and the requirement of hardware in the model configuration process is reduced.

The method and the system can effectively monitor the potential risk in the application environment and give an early warning in time, and improve the accuracy of prejudging the potential risk in the pedestrian.

Drawings

FIG. 1 is a block diagram of a method flow of an intelligent analysis method for spatial behaviors based on computer vision.

Fig. 2 is a flow chart of the implementation of the disclosure of the preferred embodiment of the present invention.

Detailed description of the invention

The present invention is further described below with reference to the accompanying drawings, and the following embodiments are only used to more clearly illustrate the technical solutions of the present invention, and the protection scope of the present invention is not limited thereby.

As shown in fig. 1 and fig. 2, the intelligent analysis method for spatial behaviors based on computer vision disclosed by the present invention can be applied to various monitoring environments including prisons, and includes the following processes:

s1: and training a YOLOv5-ArcFace model for face recognition and a YOLOv5-DeppSort-SlowFast model for human behavior detection and tracking based on a data set.

Specifically, a historical monitoring video of the pedestrian is obtained, and the video image comprises behavior and action, a face and a face expression of the pedestrian. The method is characterized in that a video frame is used as a data set for training and training a convolutional neural network model, a Yolov5-ArcFace model is adopted for recognizing the face and the face expression, and a Yolov5-DeppSort-SlowFast model is adopted for recognizing the human behavior and action.

The video acquisition standard and the classification of human faces, human facial expressions and human body behaviors and actions are as follows: 5000 segments of video segments of various human faces, human facial expressions and human body actions, each segment is 2-5 seconds, and each segment is 25 frames, and a basic human face, human facial expressions and human body action training data set is constructed. Facial expressions include 6 basic expressions: happiness, anger, sadness, fear, surprise and normality; human action is divided into two main categories again, single action class and double action class promptly, and every main category includes a plurality of subclasses again, and wherein single action includes 8 kinds of actions: sitting down, standing up, squatting down, jumping, climbing, falling down, walking back and forth, and swinging a fist left and right; the double action comprises 4 actions: boxing alto, kicking, inter-digital anger and shoulder up.

S2: and establishing a database of the corresponding relation between the facial feature vector and the expression feature vector and the category to which the facial feature vector and the expression feature vector belong and a database of the corresponding relation between the human behavior and action feature vector and the category to which the human behavior and action feature vector belongs for the pedestrian.

Specifically, a database of corresponding relations between facial feature vectors and expression feature vectors and the categories to which the facial feature vectors belong is established for each pedestrian. Meanwhile, establishing a database of the corresponding relation between the human behavior and action characteristic vectors of the prisoners and the classes to which the prisoners belong; the specific process of establishing a database of faces, facial expressions and human behavior and actions for each pedestrian in the prison is as follows:

s2-1: acquiring a previous monitoring video of each pedestrian, wherein the video comprises the face, the face expression and the human behavior of the pedestrian;

s2-2: extracting feature vectors of the face and the expression from the video through a YOLOv5-ArcFace model to obtain face feature vectors and expression feature vectors, storing the corresponding feature vectors and marking the categories to which the feature vectors belong;

s2-3: and detecting and tracking the pedestrian in the video frame through a YOLOv5-Deepsort module in a YOLOv5-Deepsort-SlowFast model, transmitting the obtained result into a SlowFast module to perform behavior action recognition, acquiring a characteristic vector of the behavior action, storing the characteristic vector of the behavior action of the human body, and labeling the motion category to which the characteristic vector belongs.

S3: for the current monitoring video containing the pedestrian, feature vectors of the face, the expression and the human body behavior of the pedestrian and the corresponding categories of the feature vectors are recognized from a video frame by utilizing the trained YOLOv5-ArcFace model and the trained YOLOv5-DeppSort-SlowFast model, and the ID is automatically allocated to the pedestrian through a Deepsort tracking module of the YOLOv5-DeppSort-SlowFast model for real-time tracking. The process comprises the following steps:

defining the acquisition of the current video stream as:

V＝{v ₁ ,v ₂ ,…,v _n }

where V is a set of video streams, V _i Is a video frame at time i, v _i Represented by a matrix of l w. Wherein l is the row number of the pixel matrix of the video frame, and w is the column number of the pixel matrix of the video frame; the video stream is real-time data collected by a monitoring camera in an actual application scene, and is output by adopting one segment per minute and 30 frames per second.

Acquiring a pixel matrix of a current video frame, detecting a human body area and a human face area by adopting a YOLOv5 network, and acquiring position coordinates of upper left corners and lower right corners of a human body area detection frame and a human face area detection frame; and then Kalman filtering, hungarian matching and IOU matching are carried out on an RGB three-color space, the tracking of the moving object is realized, an ID is automatically allocated to the tracked object, and the tracked object is prevented from being lost in a picture.

And carrying out face recognition on the detected face region by using the trained YOLOv5-ArcFace model to obtain the face, expression feature vectors and attribute classes thereof.

And performing human behavior action recognition on a frame sequence formed by each detected frame of human body region by using the trained YOLOv5-Deepsort-SlowFast model, respectively transmitting the video frame sequence into a Slowpath and a Fastpath, and respectively extracting the characteristics. And then, performing feature fusion on the features extracted from the Fastpath and the features extracted from the Slowpath to obtain a human behavior and action feature vector and the category of the human behavior and action feature vector.

S4: and comparing the recognized face characteristic vector, expression characteristic vector and human behavior and action characteristic vector of the pedestrian with the characteristic vectors of the same type recorded in the database, if the Euclidean distance between the face characteristic vector and the expression characteristic vector is smaller than a set threshold value, indicating that the recognition result is correct, and entering the next step.

Specifically, euclidean distance calculation between vectors is carried out on the feature vectors of the classification result and the feature vectors of the historical data of the pedestrians, a distance threshold value is set to be 0.1, if the calculated Euclidean distance is smaller than 0.1, the identification is considered to be correct, otherwise, the identification is not correct, and the identification is considered to be invalid. If the judgment is correct, the category corresponding to the characteristic vector of the database is taken as the category corresponding to the identification result.

S5: inputting the classes of the face feature vector, the expression feature vector and the human behavior and action feature vector of the pedestrian into a softmax classifier, and classifying the potential risks.

Specifically, the potential risk classification is performed by using softmax, and the softmax implements the calculation formula (1) of the classifier as follows:

wherein, N represents the number of training samples, W represents the weight vector of each category, b represents the bias term, x represents the feature vector, and y represents the category. The principle is that the output of the upper layer is used as the input and is input into a softmax classifier, and the probability value of each class is obtained through the classifier, wherein the highest probability value is the class to which the detected person currently belongs.

The human face and the human face expression adopt a classifier improved on the basis of a softmax classifier:

wherein, L represents the score value of each category, N represents the number of training samples, theta represents the included angle between W and x, and m represents the margin of the additive angle.

The main working principle is that the features are mapped into the angle feature space, the class spacing is increased, and meanwhile, the bias term b =0 is used, so that a higher recognition effect is achieved.

Specifically, a picture of a current monitoring video is divided into a plurality of subspaces by using OpenCV, the number of detected persons existing in each subspace in the current monitoring video is counted according to a detection tracking result of a Yolov5-Deepsort model (if a pedestrian simultaneously appears in a plurality of adjacent subspaces, the subspaces to which the pedestrian belongs are determined by calculating area values of the detected persons entering the subspaces, so that the statistical counting is performed), whether clustering behaviors exist or not is judged according to current position information and space information of the detected persons, whether behaviors with potential risks exist or not is comprehensively judged according to behavior actions, facial expressions and space information of the detected persons, and if the behaviors with potential risks exist, an early warning corresponding to the behaviors with the potential risks is sent out, and types of the potential risks are marked.

After a series of analysis of comprehensive information, the analysis result can be divided into: the exceptions are classified into low-level exceptions and high-level exceptions.

The basis for judging the clustering abnormity is to divide the monitoring video into a plurality of subspaces with equal sizes, set the maximum number of people allowed to be accommodated in each subspace area specified in the prison as a space threshold value, detect and track pedestrians through a detection and tracking module in a YOLOv5-Deepsort-SlowFast model, count the number of people in each subspace, compare the number of people with the set space threshold value, and send out abnormity early warning if the number of people exceeds the space threshold value.

In the display process, low-level anomalies such as personal anomalies are marked and displayed in the picture through yellow warning, while the anomalies such as group and mutual anomalies are high-level anomalies and need to be marked and displayed in the picture through striking red alarms.

The method disclosed by the invention can effectively improve the prejudgment accuracy of monitoring the conditions of criminal crowd, other abnormal risks and the like and send out early warning in advance.

Based on the same inventive concept as the method, the embodiment of the invention provides an intelligent analysis system for spatial behaviors based on computer vision, which comprises:

the risk early warning module divides the picture of the current monitoring video into a plurality of subspaces, detects, tracks and identifies pedestrians in the picture, judges whether a clustering behavior exists according to the unknown spatial relationship between the pedestrians and the subspaces, and sends out a clustering early warning if the clustering behavior exists; and comprehensively judging whether other behaviors with potential risks exist according to the behavior and the action, the facial expression and the spatial information of the pedestrian in the picture, if so, sending out a corresponding potential risk behavior early warning, and marking out the type of the potential risk.

Preferably, the system further comprises:

an information confirmation module: the information confirmation is used for confirming the information of the login personnel and judging whether the user has the use authority or not;

and the storage module is used for storing the content of the current monitoring video and automatically storing the current monitoring video to the local according to the preset time and period.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, it is possible to make various improvements and modifications without departing from the technical principle of the present invention, and those improvements and modifications should be also considered as the protection scope of the present invention.

Claims

1. A computer vision-based intelligent analysis method for spatial behaviors is characterized by comprising the following steps:

s3: for a current monitoring video containing pedestrians, recognizing feature vectors of faces, expressions and human body behaviors of the pedestrians and the corresponding categories of the feature vectors from a video frame by using a trained YOLOv5-ArcFace model and a trained YOLOv5-DeppSort-SlowFast model, and automatically allocating IDs to the pedestrians through a Deepsort tracking module in the YOLOv5-DeppSort-SlowFast model for real-time tracking;

s4: comparing the identified face characteristic vector, expression characteristic vector and human behavior and action characteristic vector of the pedestrian with the characteristic vectors of the same type recorded in the database, if the Euclidean distance between the face characteristic vector and the expression characteristic vector is smaller than a set threshold value, indicating that the identification result is correct, and entering the next step;

s6: dividing the picture of the current monitoring video into a plurality of subspaces, judging whether a clustering behavior exists according to the spatial position relation between the pedestrian and the subspaces, and if so, sending out a clustering early warning; and comprehensively judging whether other behaviors with potential risks exist according to the behavior and the action, the facial expression and the spatial information of the pedestrian in the picture, if so, sending out a corresponding potential risk behavior early warning, and marking out the type of the potential risk.

2. The computer vision-based intelligent analysis method for spatial behaviors according to claim 1, wherein the step S1 is preceded by:

and making a training set based on the video frames of the historical monitoring videos.

3. The computer vision-based intelligent analysis method for spatial behaviors according to claim 1, wherein the step S2 comprises:

s2-2: extracting and obtaining a face characteristic vector and an expression characteristic vector of a pedestrian from a previous monitoring video through a YOLOv5-ArcFace model, labeling the corresponding categories, and storing to obtain a database;

s2-3: detecting and tracking pedestrians in a video frame of a previous monitoring video through a YOLOv5-Deepsort-SlowFast module in a YOLOv5-Deepsort-SlowFast model, transmitting an obtained result to a SlowFast behavior recognition module in the YOLOv5-Deepsort-SlowFast model for recognizing human behavior actions, acquiring human behavior action characteristic vectors, labeling the action types, and storing to obtain a database.

4. The computer vision-based intelligent analysis method for spatial behaviors according to claim 1, wherein the step S3 comprises:

s3-2: carrying out face recognition on the detected face region by using the trained YOLOv5-ArcFace model to obtain face feature vectors, the classes of the face feature vectors, expression feature vectors and the classes;

5. The computer vision-based intelligent analysis method for spatial behaviors according to claim 1, wherein the step S4 comprises:

comparing the face characteristic vector of the pedestrian with the face characteristic vector recorded in the corresponding database, and if the Euclidean distance between the face characteristic vector and the face characteristic vector is smaller than a set face threshold value, judging that the identity matching is successful; otherwise, storing the face feature vector of the pedestrian in a corresponding database;

comparing the expression characteristic vector of the pedestrian with the expression characteristic vectors recorded in the corresponding database, and if the Euclidean distance between the expression characteristic vectors and the expression characteristic vectors is smaller than a set expression threshold value, judging that the recognition result is correct;

and comparing the human behavior action characteristic vector of the pedestrian with the human behavior action characteristic vector recorded in the corresponding database, and if the Euclidean distance between the human behavior action characteristic vector and the human behavior action characteristic vector is smaller than a set action threshold value, judging that the identification result is correct.

6. The computer vision-based intelligent analysis method for spatial behaviors according to claim 1, wherein the step S5 comprises:

softmax implements the computational expression of the classifier as follows:

7. The computer vision-based intelligent analysis method for spatial behaviors according to claim 1, wherein the step S6 comprises:

dividing a picture of a current monitoring video into a plurality of subspaces with equal sizes, setting the maximum number of people allowed to be accommodated in each subspace area as a space threshold, detecting and tracking pedestrians through a detection and tracking module in a YOLOv5-Deepsort-SlowFast model, counting the number of people in each subspace, comparing the number of people with the space threshold, and if the number of people exceeds the space threshold, sending an abnormal early warning;

8. An intelligent analysis system for spatial behavior based on computer vision, comprising:

9. The computer vision-based intelligent spatial behavior analysis system according to claim 8, further comprising:

10. The computer vision-based intelligent spatial behavior analysis system according to claim 8, further comprising: