CN109753906B - Method for detecting abnormal behaviors in public places based on domain migration - Google Patents

Method for detecting abnormal behaviors in public places based on domain migration Download PDF

Info

Publication number
CN109753906B
CN109753906B CN201811594841.4A CN201811594841A CN109753906B CN 109753906 B CN109753906 B CN 109753906B CN 201811594841 A CN201811594841 A CN 201811594841A CN 109753906 B CN109753906 B CN 109753906B
Authority
CN
China
Prior art keywords
abnormal
network
data
video
virtual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811594841.4A
Other languages
Chinese (zh)
Other versions
CN109753906A (en
Inventor
王�琦
李学龙
林维
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN201811594841.4A priority Critical patent/CN109753906B/en
Publication of CN109753906A publication Critical patent/CN109753906A/en
Application granted granted Critical
Publication of CN109753906B publication Critical patent/CN109753906B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Image Analysis (AREA)

Abstract

The invention relates to a method for detecting abnormal behaviors of public places based on domain migration, which utilizes the simulation of a virtual world to create a large number of virtual abnormal time videos, solves the problem that the diversity of abnormal events is insufficient but the data is insufficient, and uses the domain migration method to migrate virtual data to a real situation, thereby improving the adaptability of a classification detection network in formal monitoring videos and effectively improving the usability of a training network.

Description

Method for detecting abnormal behaviors in public places based on domain migration
Technical Field
The invention belongs to the field of computer vision and video monitoring. Abnormal behaviors such as fighting a shelf, escaping and the like in videos are detected aiming at the public places of video monitoring.
Background
Nowadays, cameras in public areas throughout cities generate countless monitoring videos at all times, and if abnormal behaviors of collected videos can be detected through an automatic method, the monitoring videos have a very strong preventive effect on the occurrence of public safety events. But the detection of abnormal events becomes very difficult due to the frequency of occurrence of abnormal behavior being much less than the frequency of occurrence of normal behavior, and the diversity of abnormal behavior.
At present, two methods for detecting abnormal behaviors in public places are provided: the first is a social force model-based method proposed by r.mehran et al in the documents "r.mehran, a.oyama, and m.shah, Abnormal crown behaver detection using social force model, Computer Vision and Pattern Recognition,2009.CVPR 2009.IEEE Conference on, pp.935-942,2009", which considers pedestrians as individual moving points, human-human interactions as forces between points, and detects Abnormal behavior in a video by finding Abnormal particle movements.
The second method is based on optical flow method, such as the method proposed in "y.yu, w.shen, h.huang, and z.zhang, Abnormal event detection in the crowned scenes using two spaced signatures with safety discovery, Journal of Electronic Imaging, vol.26, No.3, pp.033013, 2017", which combines multi-optical flow histogram and multi-scale gradient histogram to obtain the surface and motion features of a pedestrian, and adds Abnormal features to the traditional sparse model containing normal features only to construct a dictionary. In addition, the significance of the test sample is combined with the sparse reconstruction cost on the normal dictionary and the abnormal dictionary, and the normality of the test sample is measured.
These methods have limitations, the particle point model cannot capture the motion characteristics of the person, and the feature dictionary based on optical flow cannot guarantee that all abnormal behaviors exist in the dictionary.
Disclosure of Invention
Technical problem to be solved
In order to avoid the defects of the prior art, the invention provides a method for detecting abnormal behaviors in public places based on domain migration.
Technical scheme
A method for detecting abnormal behaviors in public places based on domain migration is characterized by comprising the following steps:
step 1: generating virtual abnormal data by using the existing virtual image product, wherein the virtual abnormal data comprises different abnormal types and normal type data, and the data quantity of each type is the same;
step 2: training a video classification network by using the virtual abnormal data generated in the step 1 to obtain a virtual abnormal data classification network;
and step 3: training a domain migration network by using the generated virtual abnormal data and the acquired real data to obtain real domain video data corresponding to the virtual abnormal video data; the domain migration network is an improved cycle-GAN, and the improved method comprises the following steps: all 2D convolution structures in the cycle-GAN network are changed into 3D convolution structures facing video data, and the calculation method of the 3D convolution structures comprises the following steps:
Figure BDA0001921126030000021
Wherein P, Q, R represents the length, width and height of the feature map of the network output in the previous layer, and m represents the number of the feature maps of the network output. Finally, under the convolution module W, the corresponding characteristic diagram V and b in the next layer of network are calculated and obtained as the offset, the ith 3d convolution structure of the ith layer and the j th layer, and the coordinate values of the length, the width and the height of x, y and z;
and 4, step 4: carrying out further classification training on the virtual abnormal data classification network obtained in the step 2 by using the real domain abnormal data obtained in the step 3, wherein the training process is the same as that in the step 2, so that an abnormal video classification network of a real domain is obtained;
and 5: inputting real abnormal data to be tested into the network model obtained by training in the step 4, obtaining the probability of the input video in each abnormal category by using a softmax function, and taking the category of the maximum value as the abnormal type of the video.
The video classification network in the step 2 is a 3DresNet or a space-time double-flow video classification network.
Advantageous effects
According to the method for detecting the abnormal behaviors of the public place based on the domain migration, provided by the invention, a large number of virtual abnormal time videos are created by utilizing the simulation of the virtual world, so that the problem that the diversity of abnormal events is poor but the data is insufficient is solved, the adaptability of a classification detection network in formal monitoring videos is improved by migrating the virtual data to a real condition by using the domain migration method, and the usability of a training network is effectively improved.
Drawings
FIG. 1 is a model, data flow diagram of the present invention;
fig. 2 is a data flow diagram of a domain migration network.
Detailed Description
The invention will now be further described with reference to the following examples and drawings:
the invention provides a public scene abnormal behavior detection method based on domain migration, which aims to solve the difficulty of abnormal behavior detection caused by the phenomena of abnormal behavior diversity, low frequency and the like. The whole technical scheme comprises the following steps:
1. the existing virtual image products such as games, CG and the like are used for creating virtual scenes, tasks, models and actions related to the abnormity, and recording abnormal behaviors in the virtual world.
2. After capturing a large amount of recorded virtual video data, the data are utilized to train a video classification deep neural network, and the network can effectively distinguish abnormal behavior categories (such as fighting, escaping and the like) and normal conditions in the virtual data set.
3. With some real-world surveillance videos, these videos do not necessarily have to have an abnormal event occurring. By utilizing the mutual conversion relation between the videos and the existing virtual videos, a domain migration network is learned, unsupervised video domain migration is carried out, the virtual videos are migrated to a real video domain which is very similar to a real scene and is lifelike, and a large number of monitoring videos containing abnormal behaviors are obtained.
4. And (3) training the classified neural network obtained in the step (2) again by using the migrated video as a data set to improve the adaptability of the neural network after crossing the domain, namely in the real data domain, and improve the detection capability of the network applied to real video monitoring.
5. In the actual application process, the monitoring video with a fixed time length can be transmitted into the trained neural network in real time each time, the classification probability of the captured short video under each abnormal class and normal condition is obtained, and the class with the highest probability is taken as the class of the video. And determining whether abnormal behaviors occur under monitoring by using the abnormal or normal behaviors of which levels the detection result belongs to.
The invention has the following concrete implementation steps:
step 1, first, an unsupervised domain migration network of the type "j.zhu, t.park, p.isola, and a.a.efros, unapplied image-to-image transformation using cycle-dependent adaptive networks, arXiv print,2017. In contrast, it should be modified somewhat so that it can process data of the video domain (cycle-GAN can only process images). The modified method is to change all 2D convolution structures in the cycle-GAN network into 3D convolution structures facing the video data. The calculation method of the 3D convolution structure comprises the following steps:
Figure BDA0001921126030000041
Wherein P, Q, R respectively represents the length, width and height of the characteristic diagram output by the previous network, and m represents the number of the characteristic diagrams output by the previous network. And finally, calculating to obtain a corresponding characteristic diagram V in the next layer of network under the convolution module W. Meanwhile, related abnormal event video data are simulated and recorded in the virtual world and are represented as rounded square blocks in the figure, namely, the virtual abnormal video data. These data include different abnormal category and normal category data for fighting, chase, flee, gunshot, run, arrest, etc. The time scale data amounts of the respective categories were approximately the same. Finally, a part of real video monitoring data is needed to express what the monitoring video is in the real scene, the time evaluation data does not need to be labeled, and the video content is not limited.
And 2, initializing a video classification network, wherein the network can be a 3DResNet, a space-time double-flow video classification network or other existing video classification networks. Here we use the existing 3DResNet, which is from "K.Hara, H.Kataoka, and Y.Satoh," Learning space-temporal features with 3D residual networks for Action Recognition, "Proceedings of the ICCV Workshop on Action, Gesture, and Electron Recognition, vol.2, No.3, pp.4, 2017". This network is an improved version of the network structure proposed in 2015-ResNet, which is improved in the same way as set forth in step one, i.e. changing the 2D convolution structure to a 3D convolution structure.
And 3, training a domain migration network by using the collected virtual abnormal data and any real data, and obtaining real domain video data corresponding to the virtual abnormal video data. As shown in FIG. 2, assume Sreal、RrealRespectively transmitting the collected virtual abnormal data and any real data to a generation network GStoRAnd GRtoSTo obtain RfakeAnd SfakeThen respectively transmitted into GRtoSAnd GStoRIn (1), obtainingreal、RrealCorresponding video, through consistency comparison and discriminator DRAnd DSTo improve the fidelity of the video after domain migration.
The whole process can be represented by the following formula:
Figure BDA0001921126030000051
i.e. in the course of training the generator, in an effort to minimize the value of the discriminator versus maximizing consistency; the value of the discriminator is maximized during the discriminator training process. R obtained finallyfakeIt can be regarded as real domain video data corresponding to the virtual abnormal video in fig. 1.
And 4, performing further classification training on the network obtained in the step 2 by using the abnormal data of the real domain obtained in the step 3, wherein the process is the same as that of the step 2, so that the abnormal video classification network of the real domain is obtained.
And 5, in the actual test process, inputting the real abnormal data into the network model obtained by training in the step 4, obtaining the probability of the input video in each abnormal category by using a softmax function, and taking the category of the maximum value as the abnormal type of the video.
The effects of the present invention can be further explained by the following simulation experiments.
1. Simulation conditions
The invention takes a four-block GeForce GTX 1080 Ti GPU as a hardware basis, takes a python programming language of 3.5.4 version on a 64-bit Ubuntu 16.04 LTS system, and takes Pytorch of 0.4.1 version and CUDA of 9.2 version as software environment to carry out actual drilling of the whole invention.
2. Emulated content
Firstly, training according to a figure 1 by using a virtual video data set obtained by simulation and video data taken from some video data sets, and finally obtaining a real domain abnormal video classification network. And the results of the model without domain migration data training and the model with domain migration data training are compared with the results of the model with domain migration data training by using 'K.Hara, H.Kataoka, and Y.Satoh, Learning spatial-temporal features with 3D residual networks for Action Recognition, Proceedings of the ICCV Workshop on Action, Gesture, and event Recognition, vol.2, No.3, pp.4, 2017'. The judgment criteria are two, i.e., the classification accuracy of the video and the misclassification severity (MISE). The latter ranks the abnormality categories by their severity and then calculates the severity after misclassification. The results are as follows:
Table 1: test results of four models on a real data set
Accuracy(%) 3D ResNet The invention
Before domain migration 19.51 17.07
After domain migration 21.14 26.02
As can be seen from table 1, the classification accuracy of the network of the present invention on the real data set after domain migration is significantly improved. The domain migration technology provided by the invention has a certain effect on improving the performance of the 3DResNet, so that the domain migration technology has higher prediction accuracy on abnormal behavior detection in public places.
Table 2: misclassification severity of four models on a real dataset
MISE 3D ResNet The invention
Before domain migration 3.48 3.45
After domain migration 3.45 2.74
From table 2, our method also has the lowest value in the severity of misclassification, which also confirms that the present invention has a lower severity of misclassification for the detection of abnormal behavior in public places.

Claims (2)

1. A method for detecting abnormal behaviors in public places based on domain migration is characterized by comprising the following steps:
step 1: generating virtual abnormal data by using the existing virtual image product, wherein the virtual abnormal data comprises different abnormal types and normal type data, and the data quantity of each type is the same;
step 2: training a video classification network by using the virtual abnormal data generated in the step 1 to obtain a virtual abnormal data classification network;
And 3, step 3: training a domain migration network by using the generated virtual abnormal data and the acquired real data to obtain real domain video data corresponding to the virtual abnormal video data; the domain migration network is an improved cycle-GAN, and the improved method comprises the following steps: all 2D convolution structures in the cycle-GAN network are changed into 3D convolution structures facing video data, and the calculation method of the 3D convolution structures comprises the following steps:
Figure FDA0003588266120000011
p, Q, R respectively represents the length, width and height of the feature map output by the previous layer of network, and m represents the number of the feature maps output by the previous layer of network; finally, under a convolution module W, obtaining a corresponding characteristic diagram V in a next layer of network by calculation, wherein b is an offset, i and j are jth 3d convolution structures of an ith layer, and x, y and z are coordinate values of length, width and height;
and 4, step 4: carrying out further classification training on the virtual abnormal data classification network obtained in the step 2 by using the real domain abnormal data obtained in the step 3, wherein the training process is the same as that in the step 2, so that an abnormal video classification network of a real domain is obtained;
and 5: inputting real abnormal data to be tested into the network model obtained by training in the step 4, obtaining the probability of the input video in each abnormal category by using a softmax function, and taking the category of the maximum value as the abnormal type of the video.
2. The method according to claim 1, wherein the video classification network in step 2 is 3 dressnet or space-time dual-stream video classification network.
CN201811594841.4A 2018-12-25 2018-12-25 Method for detecting abnormal behaviors in public places based on domain migration Active CN109753906B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811594841.4A CN109753906B (en) 2018-12-25 2018-12-25 Method for detecting abnormal behaviors in public places based on domain migration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811594841.4A CN109753906B (en) 2018-12-25 2018-12-25 Method for detecting abnormal behaviors in public places based on domain migration

Publications (2)

Publication Number Publication Date
CN109753906A CN109753906A (en) 2019-05-14
CN109753906B true CN109753906B (en) 2022-06-07

Family

ID=66403930

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811594841.4A Active CN109753906B (en) 2018-12-25 2018-12-25 Method for detecting abnormal behaviors in public places based on domain migration

Country Status (1)

Country Link
CN (1) CN109753906B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490078B (en) * 2019-07-18 2024-05-03 平安科技(深圳)有限公司 Monitoring video processing method, device, computer equipment and storage medium
CN111027594B (en) * 2019-11-18 2022-08-12 西北工业大学 Step-by-step anomaly detection method based on dictionary representation
CN111401149B (en) * 2020-02-27 2022-05-13 西北工业大学 Lightweight video behavior identification method based on long-short-term time domain modeling algorithm
CN111666852A (en) * 2020-05-28 2020-09-15 天津大学 Micro-expression double-flow network identification method based on convolutional neural network

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107437083A (en) * 2017-08-16 2017-12-05 上海荷福人工智能科技(集团)有限公司 A kind of video behavior recognition methods of adaptive pool
CN107563431A (en) * 2017-08-28 2018-01-09 西南交通大学 A kind of image abnormity detection method of combination CNN transfer learnings and SVDD
CN108140075A (en) * 2015-07-27 2018-06-08 皮沃塔尔软件公司 User behavior is classified as exception
CN108334832A (en) * 2018-01-26 2018-07-27 深圳市唯特视科技有限公司 A kind of gaze estimation method based on generation confrontation network
CN108446667A (en) * 2018-04-04 2018-08-24 北京航空航天大学 Based on the facial expression recognizing method and device for generating confrontation network data enhancing
CN108664922A (en) * 2018-05-10 2018-10-16 东华大学 A kind of infrared video Human bodys' response method based on personal safety
CN108805978A (en) * 2018-06-12 2018-11-13 江西师范大学 A kind of automatically generating device and method based on deep learning threedimensional model

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9613277B2 (en) * 2013-08-26 2017-04-04 International Business Machines Corporation Role-based tracking and surveillance
CN108345869B (en) * 2018-03-09 2022-04-08 南京理工大学 Driver posture recognition method based on depth image and virtual data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108140075A (en) * 2015-07-27 2018-06-08 皮沃塔尔软件公司 User behavior is classified as exception
CN107437083A (en) * 2017-08-16 2017-12-05 上海荷福人工智能科技(集团)有限公司 A kind of video behavior recognition methods of adaptive pool
CN107563431A (en) * 2017-08-28 2018-01-09 西南交通大学 A kind of image abnormity detection method of combination CNN transfer learnings and SVDD
CN108334832A (en) * 2018-01-26 2018-07-27 深圳市唯特视科技有限公司 A kind of gaze estimation method based on generation confrontation network
CN108446667A (en) * 2018-04-04 2018-08-24 北京航空航天大学 Based on the facial expression recognizing method and device for generating confrontation network data enhancing
CN108664922A (en) * 2018-05-10 2018-10-16 东华大学 A kind of infrared video Human bodys' response method based on personal safety
CN108805978A (en) * 2018-06-12 2018-11-13 江西师范大学 A kind of automatically generating device and method based on deep learning threedimensional model

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Action recognition using spatial-optical data organization and sequential learning framework;Yuan Yuan 等;《Neurocomputing》;20180717;第315卷;221-233 *
Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition;Kensho Hara 等;《2017 IEEE International Conference on Computer Vision Workshops》;20171231;3154-3160 *
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks;Jun-Yan Zhu 等;《2017 IEEE International Conference on Computer Vision》;20171231;2242-2251 *
基于智能监控的中小人群异常行为检测;何传阳 等;《计算机应用》;20160610;第36卷(第6期);1724-1729 *
视频监控中人体异常行为识别;赵仁凤;《宿州学院学报》;20181130;第33卷(第11期);111-115 *

Also Published As

Publication number Publication date
CN109753906A (en) 2019-05-14

Similar Documents

Publication Publication Date Title
CN109753906B (en) Method for detecting abnormal behaviors in public places based on domain migration
CN109255322B (en) A kind of human face in-vivo detection method and device
CN109118479B (en) Capsule network-based insulator defect identification and positioning device and method
US20210158023A1 (en) System and Method for Generating Image Landmarks
Bansal et al. People counting in high density crowds from still images
WO2018065158A1 (en) Computer device for training a deep neural network
Ranjan et al. Improved generalizability of deep-fakes detection using transfer learning based CNN framework
Nallaivarothayan et al. An MRF based abnormal event detection approach using motion and appearance features
CN105938559A (en) Digital image processing using convolutional neural networks
CN110287870A (en) Crowd's anomaly detection method based on comprehensive Optical-flow Feature descriptor and track
CN105095905A (en) Target recognition method and target recognition device
CN104680554B (en) Compression tracking and system based on SURF
CN112036381B (en) Visual tracking method, video monitoring method and terminal equipment
CN109635791A (en) A kind of video evidence collecting method based on deep learning
JP2024513596A (en) Image processing method and apparatus and computer readable storage medium
CN103699874A (en) Crowd abnormal behavior identification method based on SURF (Speed-Up Robust Feature) stream and LLE (Locally Linear Embedding) sparse representation
de Oliveira Silva et al. Human action recognition based on a two-stream convolutional network classifier
CN110348434A (en) Camera source discrimination method, system, storage medium and calculating equipment
CN114724218A (en) Video detection method, device, equipment and medium
Bhowmick et al. Automatic detection and damage quantification of multiple cracks on concrete surface from video
Xu et al. Tackling small data challenges in visual fire detection: a deep convolutional generative adversarial network approach
Lamba et al. A texture based mani-fold approach for crowd density estimation using Gaussian Markov Random Field
CN117011648A (en) Haptic image dataset expansion method and device based on single real sample
CN115862056A (en) Physical law-based human body abnormal behavior detection method
Wang et al. Research on an effective human action recognition model based on 3D CNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant