CN111582095B - Light-weight rapid detection method for abnormal behaviors of pedestrians - Google Patents

Light-weight rapid detection method for abnormal behaviors of pedestrians Download PDF

Info

Publication number
CN111582095B
CN111582095B CN202010346229.6A CN202010346229A CN111582095B CN 111582095 B CN111582095 B CN 111582095B CN 202010346229 A CN202010346229 A CN 202010346229A CN 111582095 B CN111582095 B CN 111582095B
Authority
CN
China
Prior art keywords
convolution
network
module
depth
skeleton information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010346229.6A
Other languages
Chinese (zh)
Other versions
CN111582095A (en
Inventor
吴晓军
袁佳兴
原盛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202010346229.6A priority Critical patent/CN111582095B/en
Publication of CN111582095A publication Critical patent/CN111582095A/en
Application granted granted Critical
Publication of CN111582095B publication Critical patent/CN111582095B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method for quickly detecting abnormal behaviors of lightweight pedestrians, which comprises the following steps: step 1, carrying out pedestrian detection on an image, and framing by using a detection frame to obtain a pedestrian detection frame; step 2, extracting human skeleton information from the pedestrian detection frame obtained in the step 1 to obtain a human skeleton information picture; step 3, carrying out background removal pretreatment on the picture obtained in the step 2; step 4, rapidly detecting the abnormal behaviors of the pedestrians on the preprocessed human body skeleton information picture in the step 3 by using a lightweight multi-scale information fusion detection network based on depth separable convolution to obtain a four-dimensional vector which respectively corresponds to four types of actions of the abnormal behaviors of the human body; the invention effectively utilizes the human skeleton information and the multi-scale information which are removed from the background and the light-weight network based on the depth separable convolution, enhances the robustness and the real-time property of the algorithm, obviously reduces the calculated amount of the network model, reduces the requirement of the algorithm on hardware and reduces the cost.

Description

Light-weight rapid detection method for abnormal behaviors of pedestrians
Technical Field
The invention belongs to the field of intelligent video monitoring, and particularly relates to a method for rapidly detecting abnormal behaviors of lightweight pedestrians.
Background
The intelligent video monitoring is a computer vision application which covers a plurality of technologies such as target detection, image classification, motion detection, deep learning and the like, and is different from a traditional monitoring system in intelligence. At present, the judgment and monitoring of the abnormal behaviors of pedestrians in videos mainly stay in the stage of manual identification. Compared with the continuous operation of computers and monitoring equipment, manual identification is basically an impossible task for a monitor to accurately process a large amount of monitoring videos in real time, and it is difficult to quickly extract useful information from the large amount of videos. The pedestrian abnormal behavior detection algorithm can ignore a large amount of data information which is useless for security protection in the video, overcomes the problems of missed detection and investigation and evidence obtaining which are easily caused by manually monitoring a video image, saves the consumption of manpower, material resources and financial resources, promotes economic benefits, and provides stable guarantee for life of people. With the progress of scientific technology, how to apply modern technology to improve the safety of public areas is a valuable topic. When some abnormal behaviors occur in public areas, such as fighting, running and crowd gathering, if the specific behaviors can be detected and identified in real time so as to be found in time and be prevented by alarming, the possibility of injury can be greatly reduced, and the method is a very effective safety measure.
At present, the human behavior recognition algorithm mostly adopts a double-flow method or combines an LSTM algorithm, the motion vector of all pixels of each frame of image in an image sequence is used for detecting the motion of an object, once the object moves, the optical flow of the corresponding pixel point changes, and therefore the behavior detection and recognition of the moving object are realized. The algorithm uses a single-frame input image to process information of space dimension, uses a multi-frame density optical flow field as input to process information of time dimension, combines data of two behavior classifications by a multi-task training method, and removes fitting to improve identification accuracy. In the method, the optical flow calculation is carried out on each pixel point in the image, so that the foreground and the background are distinguished, and the calculation of the optical flow field also relates to the fusion of multi-frame information. Obviously, the calculation amount is too large, the time consumption is long, and the timeliness is not guaranteed. In actual life, it is not practical to prepare a high-performance video card for each monitoring camera to detect abnormal behaviors of pedestrians at a high cost. Moreover, the algorithm has extremely high requirements on scenes, and the algorithm is greatly influenced by the brightness and scene change.
The 3D convolutional neural network is an extension of a traditional 2D convolutional neural network in a video behavior recognition task, and compared with a common 2D convolutional neural network, a convolution kernel of the 3D convolutional neural network increases information of a time dimension in a convolution calculation process. The input of the traditional 2D convolution is a single frame RGB image, and the obtained output is a two-dimensional characteristic diagram. The input of the 3D convolution is continuous multi-frame RGB images to form a cube, the 3D convolution kernel simultaneously extracts space domain information and time domain information, the obtained output is the cube formed by a plurality of two-dimensional characteristic graphs, and each two-dimensional image is formed by the convolution of the input multi-frame images. The number of parameters per 3D convolution is large, as well as the amount of computation.
In practical application, the detection scene of the pedestrian behavior is extremely complex, and the scene may be influenced by a lot of noises; in addition, due to the consideration of cost factors, the requirement for timeliness in practical application is difficult to meet, and the application of the algorithm is limited to a great extent. There are two ways to detect abnormal behaviors of pedestrians in public areas: firstly, connecting a camera and equipment for algorithm processing, such as a GPU (graphics processing unit) and the like together to be used as integral equipment for detection; secondly, the video stream acquired by the camera is uploaded to a cloud end through a network, and the cloud end server performs identification and detection. In either way, the timeliness is a very important indicator. Abnormal behavior detection loses its meaning if it cannot be detected in a timely manner. Both implementations consume a lot of computing and memory resources if based on today's mainstream detection algorithms, and the second approach may even put higher demands on network transmission. The increase of computer hardware and network requirements is accompanied with the increase of equipment cost price, even under the condition, the real-time requirement is difficult to achieve, and the actual deployment effect of the model is seriously influenced.
Disclosure of Invention
The invention aims to provide a method for quickly detecting abnormal behaviors of lightweight pedestrians, which overcomes the defects of poor timeliness and high cost in the prior art.
In order to achieve the purpose, the invention adopts the technical scheme that:
the invention provides a method for quickly detecting abnormal behaviors of lightweight pedestrians, which comprises the following steps:
step 1, carrying out pedestrian detection on an image, and framing by using a detection frame to obtain a pedestrian detection frame;
step 2, extracting human skeleton information from the pedestrian detection frame obtained in the step 1 to obtain a human skeleton information picture;
step 3, carrying out background removal pretreatment on the human body skeleton information picture obtained in the step 2;
and 4, detecting the abnormal behaviors of the pedestrians on the preprocessed human body skeleton information picture in the step 3 by using a lightweight multi-scale information fusion detection network based on depth separable convolution to obtain a four-dimensional vector which respectively corresponds to four types of actions of the abnormal behaviors of the human body. Preferably, in step 1, a pedestrian detection is performed on the image by using a YOLOv3 target detection algorithm, so as to obtain a pedestrian detection frame.
Preferably, in step 2, the RMPE frame is used to extract the human skeleton information from the pedestrian detection frame obtained in step 1, so as to obtain a human skeleton information picture.
Preferably, in step 4, the lightweight multi-scale information fusion detection network based on depth separable convolution comprises a trunk residual error network module and two branch network modules, wherein the trunk residual error network module comprises an input layer, and an input end of the input layer is used for receiving the preprocessed human skeleton information picture; the output end of the input layer is sequentially connected with a first convolution module and a second convolution module, and the output end of the second convolution module is respectively connected with a branch network module and a third convolution module; the output end of the third convolution module is respectively connected with the other branch network module and the fourth convolution module; the fourth convolution module is connected with the fifth convolution module, the fifth convolution module is combined with the output ends of the two branch network modules, and multi-scale information is fused and transmitted to the full connection layer; the output layer is a softmax classifier.
Preferably, the first convolution module includes one convolution layer and one pooling layer; the second convolution module comprises three depth separable convolution sub-residual network units, and each depth separable convolution sub-residual network unit comprises four depth separable convolution layers; the third convolution module comprises four depth separable convolution sub-residual network elements, and each depth separable convolution sub-residual network element comprises four depth separable convolution layers; the fourth convolution module comprises six depth separable convolution sub-residual network elements, and each depth separable convolution sub-residual network element comprises four depth separable convolution layers; the fifth convolution module includes three depth-separable convolutional sub-residual network elements and one pooling layer, each depth-separable convolutional sub-residual network element including four depth-separable convolutional layers.
Preferably, one branch network module connected to the output end of the second convolution module includes three convolution layers and one pooling layer; the other branch network module connected with the output end of the third convolution module comprises three convolution layers and a pooling layer.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a method for rapidly detecting abnormal behaviors of lightweight pedestrians, which provides a detection mode different from the existing behavior recognition algorithm by innovating the thought of the detection mode and designing the lightweight detection network, and specifically comprises the following steps:
the method has the advantages that the method is used for identifying the abnormal behaviors of the pedestrian based on the human body skeleton information and the background removal mode, so that the use of time sequence information is avoided, the calculated amount is effectively reduced, and the real-time performance of the algorithm is improved;
secondly, the invention enhances the robustness of the algorithm by removing the interference of background information, so that the algorithm is not restricted by the application scene;
thirdly, the invention designs a detection network fusing multi-scale information, effectively utilizes the multi-scale information and improves the performance of the algorithm;
and fourthly, the invention replaces the traditional convolution kernel with the deep separable convolution kernel, redesigns the lightweight network, and solves the problem that the detection timeliness is too poor due to the fact that the deployed equipment has poor computing and storing capabilities in the actual application deployment process and is influenced by cost and environmental factors.
The experimental result shows that the parameter quantity and the calculated quantity of the network model are remarkably reduced, the parameter quantity is only 1/8 of the reference algorithm model, the calculated quantity is only 1/6 of the reference algorithm model, and the requirement of the algorithm on hardware is reduced.
Drawings
FIG. 1 is a flow chart of light-weight pedestrian abnormal behavior identification;
FIG. 2 is a diagram of a lightweight multi-scale information fusion detection network architecture based on depth separable convolution;
FIG. 3 is a flow chart of a knowledge distillation algorithm;
FIG. 4 is a flow chart of human skeletal information extraction;
FIG. 5 an example of a run based on human skeletal information;
fig. 6 example of falls based on human skeletal information;
FIG. 7 is an example of fighting based on human skeletal information;
fig. 8 is a walking example based on human skeletal information.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
The invention provides a method for rapidly detecting abnormal behaviors of lightweight pedestrians, which is small in calculation amount, capable of meeting the requirement on calculation timeliness in actual deployment, not prone to interference of scene factors and high in robustness. The recognized behaviors include four types of behaviors of running, fighting, falling and walking.
Specifically, as shown in fig. 1, the algorithm for rapidly detecting the abnormal behavior of the lightweight pedestrian provided by the invention comprises the following steps:
step 1, firstly, a Yolov3 target detection algorithm is used for carrying out pedestrian detection on people in an image, and the pedestrians in the image are framed by a detection frame to obtain a pedestrian detection frame.
And 2, extracting the human skeleton information of the pedestrian detection frame in the step 1 by using an RMPE frame to obtain a human skeleton information picture.
And 3, performing background removal pretreatment on the picture obtained in the step 2, wherein the specific method comprises the following steps: and multiplying the original input image by 0, setting the original input image to be black, and only keeping the extracted human skeleton information so as to achieve the purpose of removing the background.
Saving the human body skeleton information picture without the background, and making a pedestrian abnormal behavior data set based on the human body skeleton information, wherein the data set comprises four behaviors of running, fighting, falling and walking; the data set is divided into a training set and a testing set, the training set is used for training of a subsequent recognition network, and the testing set is used for verifying the accuracy of the recognition network.
Step 4, the light-weight multi-scale information fusion detection network (DSN) based on the deep separable convolution designed by the invention is used for rapidly detecting the abnormal behaviors of the pedestrian, and the network mainly comprises 3 parts: the method comprises a trunk residual error network module based on depth separable convolution, two depth separable convolution branch network modules responsible for multi-scale information, and finally, the outputs of the three modules are connected together for final prediction. The network structure is shown in fig. 2:
1) the input of the network is an RGB image in a pedestrian abnormal behavior data set, the image in the data set is preprocessed before training, the image is uniformly adjusted to be an image with the size of 224 multiplied by 3 pixels, and the image is randomly turned over to achieve the purpose of data enhancement.
2) The first convolution module conv1 is a 7 × 7 convolution kernel with 64 number and step size of 2, and then performs the pooling operation with step size of 2, where the dimension of the output feature map is (56,56, 64).
3) The second convolution module conv2 includes 3 depth-separable convolution sub-residual network elements, each of which contains 4 convolution layers, the sizes of convolution kernels are two groups of 3 × 3 depth convolution layers and 1 × 1 point-by-point convolution respectively, the number of convolution kernels is 64, the convolution module has 12 convolution layers in total, and the output feature map dimension is (56,56, 64).
3) The left branch network conv _ left is composed of three convolutional layers and one pooling layer. The convolution kernels are 1 × 1 ordinary convolution, 3 × 3 deep convolution and 1 × 1 point-by-point convolution, respectively, and the numbers are 32, 32 and 16, respectively. The first and third 1 x 1 convolution kernels here act to change the number of input and output channels. The input is subjected to dimensionality reduction processing through the first 1 x 1 common convolution kernel to reduce the computation amount of the deep convolution operation. The third 1 × 1 point-by-point convolution kernel integrates the output features of the deep convolution and outputs the feature map with 16 channels, and because the information of small scale in the image occupies a small proportion, the proportion of the feature map participating in classification decision at last is reduced. Here, the pooling layer adopts average pooling, the size of a pooling layer filter (filter) is 14 × 14, the step size is 14, and the output feature map dimension of the network is (4,4, 16).
5) The third convolution module conv3 contains 4 depth-separable convolution sub-residual network elements, each of which contains 4 convolution layers, the sizes of convolution kernels are two groups of 3 × 3 depth convolution and 1 × 1 point-by-point convolution respectively, the first number of depth convolution is 64, the latter number is 128, the convolution module has 16 convolution layers in total, and the feature map dimension of the output is (28, 128).
6) The right branch network conv _ right consists of three convolutional layers and one pooling layer. The convolution kernels are 1 × 1 normal convolution, 3 × 3 deep convolution and 1 × 1 point-by-point convolution, respectively, and the numbers are 64, 64 and 32, respectively. Here, the pooling layer adopts average pooling, the size of a pooling layer filter (filter) is 14 × 14, the step size is 14, and the output feature map dimension of the network is (2,2, 32).
7) The fourth convolution module conv4 contains 6 depth-separable convolution sub-residual network elements, each of which contains 4 convolution layers, the sizes of convolution kernels are two groups of 3 × 3 depth convolution and 1 × 1 point-by-point convolution respectively, the first number of depth convolutions is 128, the latter number of convolutions is 256, the convolution module has 24 convolution layers in total, and the feature map dimension of the output is (14, 256).
8) The fifth convolution module conv5 includes 3 depth-separable convolution sub-residual network elements, each of which contains 4 convolution layers, the sizes of convolution kernels are two groups of 3 × 3 depth convolution and 1 × 1 point-by-point convolution, the first number of depth convolution is 256, the number of convolution layers is 512, the convolution module has 12 convolution layers in total, the feature map dimension of the output is (7, 512), then global average pooling is performed, the parameter quantity and the calculation quantity of the full connection layer are reduced, and the feature map dimension of the output is (1, 512).
9) And then combining the outputs of the backbone network and the two branches, and fusing and inputting the multi-scale information into a full connection layer.
10) The output layer uses a softmax classifier, and the output of the softmax classifier is a four-dimensional vector which corresponds to four types of actions in abnormal behavior detection respectively. The formula is as follows:
Figure BDA0002470260080000071
wherein p is(i)The representation is the probability of the ith type of action, which is a scalar, and z is a 4-dimensional vector representing the input of softmax. The loss function used is a cross-entropy loss function, an expressionThe following were used:
Figure BDA0002470260080000072
wherein, yiEqual to 0 or 1, which is 1 if the predicted action category is correct, and 0 otherwise. The more accurate the result of the prediction, the smaller the value of the loss function, the more ReLU function is the activation function used in the network.
Due to the reduction of the number of network layers and parameters, the ability of the network to extract features and fit data distributions is weakened, and the accuracy is inferior to that of the conventional convolutional network with a deeper number of layers. Therefore, the performance of the lightweight network is improved by using a knowledge distillation optimization algorithm, and the flow of the knowledge distillation algorithm is shown in fig. 3:
the algorithm process is as follows:
1) firstly, training a large-scale complex teacher network by using a pedestrian abnormal behavior data set based on human body skeleton information, and storing a trained network model.
2) And simultaneously sending the input image into a teacher network and a student network for processing, wherein the teacher network is used for generating a soft target, dividing the logits of the teacher network by a parameter T, and then connecting a softmax classifier to output the soft target.
Figure BDA0002470260080000081
Wherein p isiSoft target, t, for teacher network outputiThe value before softmax operation for the teacher network, T is the set temperature parameter.
3) Training the student network by using the soft target of the teacher network, wherein the output of the student network needs to be processed by two items to obtain different loss, wherein one item is the loss L obtained by using the soft target as a target valuesoftThe operation of dividing by T is also needed before the student network performs softmax classification output. The expression formula is shown as the following formula:
Figure BDA0002470260080000082
wherein s isiT is the set temperature parameter for the student network before softmax operation.
LsoftIs calculated as shown in the equation:
Figure BDA0002470260080000083
wherein q isiIs the output of the student network, piThe other loss is obtained by using hard target as target value and is marked as LhardThe term loss uses the generic softmax classification function, whose expression is as follows:
Figure BDA0002470260080000091
wherein, yiAs the true tag value of the data, qiIs the output of the student network.
4) The two terms of loss are weighted and summed to obtain the total loss, which is marked as LtotalThe expression is as follows:
Figure BDA0002470260080000092
t and alpha are hyper-parameters, the value range of T is 1-20, a larger T value is set in the training process and then is gradually reduced, so that the output information of the student network can be fully utilized, and the capability of extracting the characteristic information can be better learned. The value of alpha is initially set to 0.95, the weight of the teacher network is increased, the student network preferentially learns the teacher network, then the weight is gradually reduced, the real label and the soft target are balanced, and the student network is enabled to perform better in specific tasks.
And 5, outputting a detection result.
Examples
The method for rapidly detecting the abnormal behaviors of the pedestrians by utilizing the information based on the human skeleton specifically comprises the following steps:
step 1: and carrying out human body frame calibration of the pedestrian by using a YOLOv3 target detection algorithm on the input image.
Step 2: and extracting the human skeleton information of the obtained human body frame by using an RMPE frame to obtain a human skeleton information picture.
And step 3: and (4) carrying out background removal pretreatment on the picture obtained in the step (2). The human skeleton information extraction flow is shown in fig. 4, and examples of running, falling, fighting and walking are shown in fig. 5, 6, 7 and 8.
And 4, step 4: and 3, rapidly detecting the abnormal behaviors of the pedestrians by using the light-weight multi-scale information fusion detection network based on the depth separable convolution, which is designed by the invention, for the image which contains the human skeleton information and is removed of the background.
And 5: and outputting a detection result.
Experiments prove that the calculation amount is obviously reduced, the real-time performance and the robustness are improved, and the effect of rapid detection is achieved.

Claims (4)

1. A method for rapidly detecting abnormal behaviors of lightweight pedestrians is characterized by comprising the following steps:
step 1, carrying out pedestrian detection on an image, and framing by using a detection frame to obtain a pedestrian detection frame;
step 2, extracting human skeleton information from the pedestrian detection frame obtained in the step 1 to obtain a human skeleton information picture;
step 3, carrying out background removal pretreatment on the human body skeleton information picture obtained in the step 2;
step 4, carrying out pedestrian abnormal behavior detection on the preprocessed human body skeleton information picture in the step 3 by using a lightweight multi-scale information fusion detection network based on depth separable convolution to obtain a four-dimensional vector which respectively corresponds to four types of actions of human body abnormal behaviors;
in step 4, the lightweight multi-scale information fusion detection network based on the depth separable convolution comprises a trunk residual error network module and two branch network modules, wherein the trunk residual error network module comprises an input layer, and the input end of the input layer is used for receiving the preprocessed human skeleton information picture; the output end of the input layer is sequentially connected with a first convolution module and a second convolution module, and the output end of the second convolution module is respectively connected with a branch network module and a third convolution module; the output end of the third convolution module is respectively connected with the other branch network module and the fourth convolution module; the fourth convolution module is connected with the fifth convolution module, the fifth convolution module is combined with the output ends of the two branch network modules, and multi-scale information is fused and transmitted to the full connection layer; the output layer is a softmax classifier;
the first convolution module comprises a convolution layer and a pooling layer; the second convolution module comprises three depth separable convolution sub-residual network units, and each depth separable convolution sub-residual network unit comprises four depth separable convolution layers; the third convolution module comprises four depth separable convolution sub-residual network elements, and each depth separable convolution sub-residual network element comprises four depth separable convolution layers; the fourth convolution module comprises six depth separable convolution sub-residual network elements, and each depth separable convolution sub-residual network element comprises four depth separable convolution layers; the fifth convolution module includes three depth-separable convolutional sub-residual network elements and one pooling layer, each depth-separable convolutional sub-residual network element including four depth-separable convolutional layers.
2. The method as claimed in claim 1, wherein in step 1, a pedestrian detection frame is obtained by performing pedestrian detection on the image by using a YOLOv3 target detection algorithm.
3. The method according to claim 1, wherein in step 2, the RMPE frame is used to extract the human skeleton information from the pedestrian detection frame obtained in step 1, so as to obtain a human skeleton information picture.
4. The method for rapidly detecting the abnormal behaviors of the lightweight pedestrian according to claim 1, wherein a branch network module connected with the output end of the second convolution module comprises three convolution layers and a pooling layer; the other branch network module connected with the output end of the third convolution module comprises three convolution layers and a pooling layer.
CN202010346229.6A 2020-04-27 2020-04-27 Light-weight rapid detection method for abnormal behaviors of pedestrians Active CN111582095B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010346229.6A CN111582095B (en) 2020-04-27 2020-04-27 Light-weight rapid detection method for abnormal behaviors of pedestrians

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010346229.6A CN111582095B (en) 2020-04-27 2020-04-27 Light-weight rapid detection method for abnormal behaviors of pedestrians

Publications (2)

Publication Number Publication Date
CN111582095A CN111582095A (en) 2020-08-25
CN111582095B true CN111582095B (en) 2022-02-01

Family

ID=72111802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010346229.6A Active CN111582095B (en) 2020-04-27 2020-04-27 Light-weight rapid detection method for abnormal behaviors of pedestrians

Country Status (1)

Country Link
CN (1) CN111582095B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364804B (en) * 2020-11-20 2023-08-25 大连大学 Pedestrian detection method based on depth separable convolution and standard convolution
CN112998697B (en) * 2021-02-22 2022-06-14 电子科技大学 Tumble injury degree prediction method and system based on skeleton data and terminal
CN113011322B (en) * 2021-03-17 2023-09-05 贵州安防工程技术研究中心有限公司 Detection model training method and detection method for monitoring specific abnormal behavior of video
CN113486706B (en) * 2021-05-21 2022-11-15 天津大学 Online action recognition method based on human body posture estimation and historical information
CN113361370B (en) * 2021-06-02 2023-06-23 南京工业大学 Abnormal behavior detection method based on deep learning
CN116204830B (en) * 2023-04-28 2023-07-11 苏芯物联技术(南京)有限公司 Welding abnormality real-time detection method based on path aggregation network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635790A (en) * 2019-01-28 2019-04-16 杭州电子科技大学 A kind of pedestrian's abnormal behaviour recognition methods based on 3D convolution
CN109828251A (en) * 2019-03-07 2019-05-31 中国人民解放军海军航空大学 Radar target identification method based on feature pyramid light weight convolutional neural networks
CN109886209A (en) * 2019-02-25 2019-06-14 成都旷视金智科技有限公司 Anomaly detection method and device, mobile unit
CN110472500A (en) * 2019-07-09 2019-11-19 北京理工大学 A kind of water surface sensation target fast algorithm of detecting based on high speed unmanned boat
CN110633624A (en) * 2019-07-26 2019-12-31 北京工业大学 Machine vision human body abnormal behavior identification method based on multi-feature fusion
CN110660046A (en) * 2019-08-30 2020-01-07 太原科技大学 Industrial product defect image classification method based on lightweight deep neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11734545B2 (en) * 2017-11-14 2023-08-22 Google Llc Highly efficient convolutional neural networks

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635790A (en) * 2019-01-28 2019-04-16 杭州电子科技大学 A kind of pedestrian's abnormal behaviour recognition methods based on 3D convolution
CN109886209A (en) * 2019-02-25 2019-06-14 成都旷视金智科技有限公司 Anomaly detection method and device, mobile unit
CN109828251A (en) * 2019-03-07 2019-05-31 中国人民解放军海军航空大学 Radar target identification method based on feature pyramid light weight convolutional neural networks
CN110472500A (en) * 2019-07-09 2019-11-19 北京理工大学 A kind of water surface sensation target fast algorithm of detecting based on high speed unmanned boat
CN110633624A (en) * 2019-07-26 2019-12-31 北京工业大学 Machine vision human body abnormal behavior identification method based on multi-feature fusion
CN110660046A (en) * 2019-08-30 2020-01-07 太原科技大学 Industrial product defect image classification method based on lightweight deep neural network

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
An multi-scale learning network with depthwise separable convolutions;Gaihua Wang;《IPSJ Transactions on Computer Vision and Applications》;20180731;1-8页 *
Long-Term Recurrent Convolutional Networks for Visual Recognition and Description;Jeff Donahue;《IEEE Transactions on Pattern Analysis and Machine Intelligence》;20160901;第39卷(第4期);677-691页 *
Multi-Scale Depthwise Separable Convolutional Neural Network for Hyperspectral Image Classification;Jiliang Yan;《International Forum on Digital TV and Wireless Multimedia Communications》;20200216;171-185页 *
Xception: Deep Learning with Depthwise Separable Convolutions;Francois Chollet;《2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)》;20171109;1800-1807页 *
基于深度神经网络的扶梯异常行为检测;吉训生;《激光与光电子学进展》;20200331;第57卷(第6期);1-10页 *

Also Published As

Publication number Publication date
CN111582095A (en) 2020-08-25

Similar Documents

Publication Publication Date Title
CN111582095B (en) Light-weight rapid detection method for abnormal behaviors of pedestrians
Chakma et al. Image-based air quality analysis using deep convolutional neural network
Wang et al. Research on face recognition based on deep learning
CN106919920B (en) Scene recognition method based on convolution characteristics and space vision bag-of-words model
Pan et al. Deepfake detection through deep learning
Rahmon et al. Motion U-Net: Multi-cue encoder-decoder network for motion segmentation
CN111582092B (en) Pedestrian abnormal behavior detection method based on human skeleton
CN109948709B (en) Multitask attribute identification system of target object
CN110929593A (en) Real-time significance pedestrian detection method based on detail distinguishing and distinguishing
WO2021073311A1 (en) Image recognition method and apparatus, computer-readable storage medium and chip
Zhang et al. CNN cloud detection algorithm based on channel and spatial attention and probabilistic upsampling for remote sensing image
CN110008793A (en) Face identification method, device and equipment
CN110765960B (en) Pedestrian re-identification method for adaptive multi-task deep learning
Yang et al. Anomaly detection in moving crowds through spatiotemporal autoencoding and additional attention
CN116343330A (en) Abnormal behavior identification method for infrared-visible light image fusion
CN114821014A (en) Multi-mode and counterstudy-based multi-task target detection and identification method and device
CN116052212A (en) Semi-supervised cross-mode pedestrian re-recognition method based on dual self-supervised learning
CN113255602A (en) Dynamic gesture recognition method based on multi-modal data
CN114332911A (en) Head posture detection method and device and computer equipment
Sun et al. IRDCLNet: Instance segmentation of ship images based on interference reduction and dynamic contour learning in foggy scenes
Tao et al. An adaptive frame selection network with enhanced dilated convolution for video smoke recognition
Yandouzi et al. Investigation of combining deep learning object recognition with drones for forest fire detection and monitoring
CN114360073A (en) Image identification method and related device
CN112800979B (en) Dynamic expression recognition method and system based on characterization flow embedded network
Ahmad et al. Embedded deep vision in smart cameras for multi-view objects representation and retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant