CN109753906B - Method for detecting abnormal behaviors in public places based on domain migration - Google Patents
Method for detecting abnormal behaviors in public places based on domain migration Download PDFInfo
- Publication number
- CN109753906B CN109753906B CN201811594841.4A CN201811594841A CN109753906B CN 109753906 B CN109753906 B CN 109753906B CN 201811594841 A CN201811594841 A CN 201811594841A CN 109753906 B CN109753906 B CN 109753906B
- Authority
- CN
- China
- Prior art keywords
- abnormal
- network
- data
- video
- virtual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Image Analysis (AREA)
Abstract
The invention relates to a method for detecting abnormal behaviors of public places based on domain migration, which utilizes the simulation of a virtual world to create a large number of virtual abnormal time videos, solves the problem that the diversity of abnormal events is insufficient but the data is insufficient, and uses the domain migration method to migrate virtual data to a real situation, thereby improving the adaptability of a classification detection network in formal monitoring videos and effectively improving the usability of a training network.
Description
Technical Field
The invention belongs to the field of computer vision and video monitoring. Abnormal behaviors such as fighting a shelf, escaping and the like in videos are detected aiming at the public places of video monitoring.
Background
Nowadays, cameras in public areas throughout cities generate countless monitoring videos at all times, and if abnormal behaviors of collected videos can be detected through an automatic method, the monitoring videos have a very strong preventive effect on the occurrence of public safety events. But the detection of abnormal events becomes very difficult due to the frequency of occurrence of abnormal behavior being much less than the frequency of occurrence of normal behavior, and the diversity of abnormal behavior.
At present, two methods for detecting abnormal behaviors in public places are provided: the first is a social force model-based method proposed by r.mehran et al in the documents "r.mehran, a.oyama, and m.shah, Abnormal crown behaver detection using social force model, Computer Vision and Pattern Recognition,2009.CVPR 2009.IEEE Conference on, pp.935-942,2009", which considers pedestrians as individual moving points, human-human interactions as forces between points, and detects Abnormal behavior in a video by finding Abnormal particle movements.
The second method is based on optical flow method, such as the method proposed in "y.yu, w.shen, h.huang, and z.zhang, Abnormal event detection in the crowned scenes using two spaced signatures with safety discovery, Journal of Electronic Imaging, vol.26, No.3, pp.033013, 2017", which combines multi-optical flow histogram and multi-scale gradient histogram to obtain the surface and motion features of a pedestrian, and adds Abnormal features to the traditional sparse model containing normal features only to construct a dictionary. In addition, the significance of the test sample is combined with the sparse reconstruction cost on the normal dictionary and the abnormal dictionary, and the normality of the test sample is measured.
These methods have limitations, the particle point model cannot capture the motion characteristics of the person, and the feature dictionary based on optical flow cannot guarantee that all abnormal behaviors exist in the dictionary.
Disclosure of Invention
Technical problem to be solved
In order to avoid the defects of the prior art, the invention provides a method for detecting abnormal behaviors in public places based on domain migration.
Technical scheme
A method for detecting abnormal behaviors in public places based on domain migration is characterized by comprising the following steps:
step 1: generating virtual abnormal data by using the existing virtual image product, wherein the virtual abnormal data comprises different abnormal types and normal type data, and the data quantity of each type is the same;
step 2: training a video classification network by using the virtual abnormal data generated in the step 1 to obtain a virtual abnormal data classification network;
and step 3: training a domain migration network by using the generated virtual abnormal data and the acquired real data to obtain real domain video data corresponding to the virtual abnormal video data; the domain migration network is an improved cycle-GAN, and the improved method comprises the following steps: all 2D convolution structures in the cycle-GAN network are changed into 3D convolution structures facing video data, and the calculation method of the 3D convolution structures comprises the following steps:
Wherein P, Q, R represents the length, width and height of the feature map of the network output in the previous layer, and m represents the number of the feature maps of the network output. Finally, under the convolution module W, the corresponding characteristic diagram V and b in the next layer of network are calculated and obtained as the offset, the ith 3d convolution structure of the ith layer and the j th layer, and the coordinate values of the length, the width and the height of x, y and z;
and 4, step 4: carrying out further classification training on the virtual abnormal data classification network obtained in the step 2 by using the real domain abnormal data obtained in the step 3, wherein the training process is the same as that in the step 2, so that an abnormal video classification network of a real domain is obtained;
and 5: inputting real abnormal data to be tested into the network model obtained by training in the step 4, obtaining the probability of the input video in each abnormal category by using a softmax function, and taking the category of the maximum value as the abnormal type of the video.
The video classification network in the step 2 is a 3DresNet or a space-time double-flow video classification network.
Advantageous effects
According to the method for detecting the abnormal behaviors of the public place based on the domain migration, provided by the invention, a large number of virtual abnormal time videos are created by utilizing the simulation of the virtual world, so that the problem that the diversity of abnormal events is poor but the data is insufficient is solved, the adaptability of a classification detection network in formal monitoring videos is improved by migrating the virtual data to a real condition by using the domain migration method, and the usability of a training network is effectively improved.
Drawings
FIG. 1 is a model, data flow diagram of the present invention;
fig. 2 is a data flow diagram of a domain migration network.
Detailed Description
The invention will now be further described with reference to the following examples and drawings:
the invention provides a public scene abnormal behavior detection method based on domain migration, which aims to solve the difficulty of abnormal behavior detection caused by the phenomena of abnormal behavior diversity, low frequency and the like. The whole technical scheme comprises the following steps:
1. the existing virtual image products such as games, CG and the like are used for creating virtual scenes, tasks, models and actions related to the abnormity, and recording abnormal behaviors in the virtual world.
2. After capturing a large amount of recorded virtual video data, the data are utilized to train a video classification deep neural network, and the network can effectively distinguish abnormal behavior categories (such as fighting, escaping and the like) and normal conditions in the virtual data set.
3. With some real-world surveillance videos, these videos do not necessarily have to have an abnormal event occurring. By utilizing the mutual conversion relation between the videos and the existing virtual videos, a domain migration network is learned, unsupervised video domain migration is carried out, the virtual videos are migrated to a real video domain which is very similar to a real scene and is lifelike, and a large number of monitoring videos containing abnormal behaviors are obtained.
4. And (3) training the classified neural network obtained in the step (2) again by using the migrated video as a data set to improve the adaptability of the neural network after crossing the domain, namely in the real data domain, and improve the detection capability of the network applied to real video monitoring.
5. In the actual application process, the monitoring video with a fixed time length can be transmitted into the trained neural network in real time each time, the classification probability of the captured short video under each abnormal class and normal condition is obtained, and the class with the highest probability is taken as the class of the video. And determining whether abnormal behaviors occur under monitoring by using the abnormal or normal behaviors of which levels the detection result belongs to.
The invention has the following concrete implementation steps:
step 1, first, an unsupervised domain migration network of the type "j.zhu, t.park, p.isola, and a.a.efros, unapplied image-to-image transformation using cycle-dependent adaptive networks, arXiv print,2017. In contrast, it should be modified somewhat so that it can process data of the video domain (cycle-GAN can only process images). The modified method is to change all 2D convolution structures in the cycle-GAN network into 3D convolution structures facing the video data. The calculation method of the 3D convolution structure comprises the following steps:
Wherein P, Q, R respectively represents the length, width and height of the characteristic diagram output by the previous network, and m represents the number of the characteristic diagrams output by the previous network. And finally, calculating to obtain a corresponding characteristic diagram V in the next layer of network under the convolution module W. Meanwhile, related abnormal event video data are simulated and recorded in the virtual world and are represented as rounded square blocks in the figure, namely, the virtual abnormal video data. These data include different abnormal category and normal category data for fighting, chase, flee, gunshot, run, arrest, etc. The time scale data amounts of the respective categories were approximately the same. Finally, a part of real video monitoring data is needed to express what the monitoring video is in the real scene, the time evaluation data does not need to be labeled, and the video content is not limited.
And 2, initializing a video classification network, wherein the network can be a 3DResNet, a space-time double-flow video classification network or other existing video classification networks. Here we use the existing 3DResNet, which is from "K.Hara, H.Kataoka, and Y.Satoh," Learning space-temporal features with 3D residual networks for Action Recognition, "Proceedings of the ICCV Workshop on Action, Gesture, and Electron Recognition, vol.2, No.3, pp.4, 2017". This network is an improved version of the network structure proposed in 2015-ResNet, which is improved in the same way as set forth in step one, i.e. changing the 2D convolution structure to a 3D convolution structure.
And 3, training a domain migration network by using the collected virtual abnormal data and any real data, and obtaining real domain video data corresponding to the virtual abnormal video data. As shown in FIG. 2, assume Sreal、RrealRespectively transmitting the collected virtual abnormal data and any real data to a generation network GStoRAnd GRtoSTo obtain RfakeAnd SfakeThen respectively transmitted into GRtoSAnd GStoRIn (1), obtainingreal、RrealCorresponding video, through consistency comparison and discriminator DRAnd DSTo improve the fidelity of the video after domain migration.
The whole process can be represented by the following formula:
i.e. in the course of training the generator, in an effort to minimize the value of the discriminator versus maximizing consistency; the value of the discriminator is maximized during the discriminator training process. R obtained finallyfakeIt can be regarded as real domain video data corresponding to the virtual abnormal video in fig. 1.
And 4, performing further classification training on the network obtained in the step 2 by using the abnormal data of the real domain obtained in the step 3, wherein the process is the same as that of the step 2, so that the abnormal video classification network of the real domain is obtained.
And 5, in the actual test process, inputting the real abnormal data into the network model obtained by training in the step 4, obtaining the probability of the input video in each abnormal category by using a softmax function, and taking the category of the maximum value as the abnormal type of the video.
The effects of the present invention can be further explained by the following simulation experiments.
1. Simulation conditions
The invention takes a four-block GeForce GTX 1080 Ti GPU as a hardware basis, takes a python programming language of 3.5.4 version on a 64-bit Ubuntu 16.04 LTS system, and takes Pytorch of 0.4.1 version and CUDA of 9.2 version as software environment to carry out actual drilling of the whole invention.
2. Emulated content
Firstly, training according to a figure 1 by using a virtual video data set obtained by simulation and video data taken from some video data sets, and finally obtaining a real domain abnormal video classification network. And the results of the model without domain migration data training and the model with domain migration data training are compared with the results of the model with domain migration data training by using 'K.Hara, H.Kataoka, and Y.Satoh, Learning spatial-temporal features with 3D residual networks for Action Recognition, Proceedings of the ICCV Workshop on Action, Gesture, and event Recognition, vol.2, No.3, pp.4, 2017'. The judgment criteria are two, i.e., the classification accuracy of the video and the misclassification severity (MISE). The latter ranks the abnormality categories by their severity and then calculates the severity after misclassification. The results are as follows:
Table 1: test results of four models on a real data set
Accuracy(%) | 3D ResNet | The invention |
Before domain migration | 19.51 | 17.07 |
After domain migration | 21.14 | 26.02 |
As can be seen from table 1, the classification accuracy of the network of the present invention on the real data set after domain migration is significantly improved. The domain migration technology provided by the invention has a certain effect on improving the performance of the 3DResNet, so that the domain migration technology has higher prediction accuracy on abnormal behavior detection in public places.
Table 2: misclassification severity of four models on a real dataset
MISE | 3D ResNet | The invention |
Before domain migration | 3.48 | 3.45 |
After domain migration | 3.45 | 2.74 |
From table 2, our method also has the lowest value in the severity of misclassification, which also confirms that the present invention has a lower severity of misclassification for the detection of abnormal behavior in public places.
Claims (2)
1. A method for detecting abnormal behaviors in public places based on domain migration is characterized by comprising the following steps:
step 1: generating virtual abnormal data by using the existing virtual image product, wherein the virtual abnormal data comprises different abnormal types and normal type data, and the data quantity of each type is the same;
step 2: training a video classification network by using the virtual abnormal data generated in the step 1 to obtain a virtual abnormal data classification network;
And 3, step 3: training a domain migration network by using the generated virtual abnormal data and the acquired real data to obtain real domain video data corresponding to the virtual abnormal video data; the domain migration network is an improved cycle-GAN, and the improved method comprises the following steps: all 2D convolution structures in the cycle-GAN network are changed into 3D convolution structures facing video data, and the calculation method of the 3D convolution structures comprises the following steps:
p, Q, R respectively represents the length, width and height of the feature map output by the previous layer of network, and m represents the number of the feature maps output by the previous layer of network; finally, under a convolution module W, obtaining a corresponding characteristic diagram V in a next layer of network by calculation, wherein b is an offset, i and j are jth 3d convolution structures of an ith layer, and x, y and z are coordinate values of length, width and height;
and 4, step 4: carrying out further classification training on the virtual abnormal data classification network obtained in the step 2 by using the real domain abnormal data obtained in the step 3, wherein the training process is the same as that in the step 2, so that an abnormal video classification network of a real domain is obtained;
and 5: inputting real abnormal data to be tested into the network model obtained by training in the step 4, obtaining the probability of the input video in each abnormal category by using a softmax function, and taking the category of the maximum value as the abnormal type of the video.
2. The method according to claim 1, wherein the video classification network in step 2 is 3 dressnet or space-time dual-stream video classification network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811594841.4A CN109753906B (en) | 2018-12-25 | 2018-12-25 | Method for detecting abnormal behaviors in public places based on domain migration |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811594841.4A CN109753906B (en) | 2018-12-25 | 2018-12-25 | Method for detecting abnormal behaviors in public places based on domain migration |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109753906A CN109753906A (en) | 2019-05-14 |
CN109753906B true CN109753906B (en) | 2022-06-07 |
Family
ID=66403930
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811594841.4A Active CN109753906B (en) | 2018-12-25 | 2018-12-25 | Method for detecting abnormal behaviors in public places based on domain migration |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109753906B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110490078B (en) * | 2019-07-18 | 2024-05-03 | 平安科技(深圳)有限公司 | Monitoring video processing method, device, computer equipment and storage medium |
CN111027594B (en) * | 2019-11-18 | 2022-08-12 | 西北工业大学 | Step-by-step anomaly detection method based on dictionary representation |
CN111401149B (en) * | 2020-02-27 | 2022-05-13 | 西北工业大学 | Lightweight video behavior identification method based on long-short-term time domain modeling algorithm |
CN111666852A (en) * | 2020-05-28 | 2020-09-15 | 天津大学 | Micro-expression double-flow network identification method based on convolutional neural network |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107437083A (en) * | 2017-08-16 | 2017-12-05 | 上海荷福人工智能科技(集团)有限公司 | A kind of video behavior recognition methods of adaptive pool |
CN107563431A (en) * | 2017-08-28 | 2018-01-09 | 西南交通大学 | A kind of image abnormity detection method of combination CNN transfer learnings and SVDD |
CN108140075A (en) * | 2015-07-27 | 2018-06-08 | 皮沃塔尔软件公司 | User behavior is classified as exception |
CN108334832A (en) * | 2018-01-26 | 2018-07-27 | 深圳市唯特视科技有限公司 | A kind of gaze estimation method based on generation confrontation network |
CN108446667A (en) * | 2018-04-04 | 2018-08-24 | 北京航空航天大学 | Based on the facial expression recognizing method and device for generating confrontation network data enhancing |
CN108664922A (en) * | 2018-05-10 | 2018-10-16 | 东华大学 | A kind of infrared video Human bodys' response method based on personal safety |
CN108805978A (en) * | 2018-06-12 | 2018-11-13 | 江西师范大学 | A kind of automatically generating device and method based on deep learning threedimensional model |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9613277B2 (en) * | 2013-08-26 | 2017-04-04 | International Business Machines Corporation | Role-based tracking and surveillance |
CN108345869B (en) * | 2018-03-09 | 2022-04-08 | 南京理工大学 | Driver posture recognition method based on depth image and virtual data |
-
2018
- 2018-12-25 CN CN201811594841.4A patent/CN109753906B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108140075A (en) * | 2015-07-27 | 2018-06-08 | 皮沃塔尔软件公司 | User behavior is classified as exception |
CN107437083A (en) * | 2017-08-16 | 2017-12-05 | 上海荷福人工智能科技(集团)有限公司 | A kind of video behavior recognition methods of adaptive pool |
CN107563431A (en) * | 2017-08-28 | 2018-01-09 | 西南交通大学 | A kind of image abnormity detection method of combination CNN transfer learnings and SVDD |
CN108334832A (en) * | 2018-01-26 | 2018-07-27 | 深圳市唯特视科技有限公司 | A kind of gaze estimation method based on generation confrontation network |
CN108446667A (en) * | 2018-04-04 | 2018-08-24 | 北京航空航天大学 | Based on the facial expression recognizing method and device for generating confrontation network data enhancing |
CN108664922A (en) * | 2018-05-10 | 2018-10-16 | 东华大学 | A kind of infrared video Human bodys' response method based on personal safety |
CN108805978A (en) * | 2018-06-12 | 2018-11-13 | 江西师范大学 | A kind of automatically generating device and method based on deep learning threedimensional model |
Non-Patent Citations (5)
Title |
---|
Action recognition using spatial-optical data organization and sequential learning framework;Yuan Yuan 等;《Neurocomputing》;20180717;第315卷;221-233 * |
Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition;Kensho Hara 等;《2017 IEEE International Conference on Computer Vision Workshops》;20171231;3154-3160 * |
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks;Jun-Yan Zhu 等;《2017 IEEE International Conference on Computer Vision》;20171231;2242-2251 * |
基于智能监控的中小人群异常行为检测;何传阳 等;《计算机应用》;20160610;第36卷(第6期);1724-1729 * |
视频监控中人体异常行为识别;赵仁凤;《宿州学院学报》;20181130;第33卷(第11期);111-115 * |
Also Published As
Publication number | Publication date |
---|---|
CN109753906A (en) | 2019-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109753906B (en) | Method for detecting abnormal behaviors in public places based on domain migration | |
CN109255322B (en) | A kind of human face in-vivo detection method and device | |
CN109118479B (en) | Capsule network-based insulator defect identification and positioning device and method | |
US20210158023A1 (en) | System and Method for Generating Image Landmarks | |
Bansal et al. | People counting in high density crowds from still images | |
WO2018065158A1 (en) | Computer device for training a deep neural network | |
Ranjan et al. | Improved generalizability of deep-fakes detection using transfer learning based CNN framework | |
Nallaivarothayan et al. | An MRF based abnormal event detection approach using motion and appearance features | |
CN105938559A (en) | Digital image processing using convolutional neural networks | |
CN110287870A (en) | Crowd's anomaly detection method based on comprehensive Optical-flow Feature descriptor and track | |
CN105095905A (en) | Target recognition method and target recognition device | |
CN104680554B (en) | Compression tracking and system based on SURF | |
CN112036381B (en) | Visual tracking method, video monitoring method and terminal equipment | |
CN109635791A (en) | A kind of video evidence collecting method based on deep learning | |
JP2024513596A (en) | Image processing method and apparatus and computer readable storage medium | |
CN103699874A (en) | Crowd abnormal behavior identification method based on SURF (Speed-Up Robust Feature) stream and LLE (Locally Linear Embedding) sparse representation | |
de Oliveira Silva et al. | Human action recognition based on a two-stream convolutional network classifier | |
CN110348434A (en) | Camera source discrimination method, system, storage medium and calculating equipment | |
CN114724218A (en) | Video detection method, device, equipment and medium | |
Bhowmick et al. | Automatic detection and damage quantification of multiple cracks on concrete surface from video | |
Xu et al. | Tackling small data challenges in visual fire detection: a deep convolutional generative adversarial network approach | |
Lamba et al. | A texture based mani-fold approach for crowd density estimation using Gaussian Markov Random Field | |
CN117011648A (en) | Haptic image dataset expansion method and device based on single real sample | |
CN115862056A (en) | Physical law-based human body abnormal behavior detection method | |
Wang et al. | Research on an effective human action recognition model based on 3D CNN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |