CN114581738A - Behavior prediction network training method and system and behavior anomaly detection method and system - Google Patents

Behavior prediction network training method and system and behavior anomaly detection method and system Download PDF

Info

Publication number
CN114581738A
CN114581738A CN202210285382.1A CN202210285382A CN114581738A CN 114581738 A CN114581738 A CN 114581738A CN 202210285382 A CN202210285382 A CN 202210285382A CN 114581738 A CN114581738 A CN 114581738A
Authority
CN
China
Prior art keywords
network
frame
key
heterogeneous
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210285382.1A
Other languages
Chinese (zh)
Inventor
李洪均
孙晓虎
李超波
陈俊杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong University
Original Assignee
Nantong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong University filed Critical Nantong University
Priority to CN202210285382.1A priority Critical patent/CN114581738A/en
Publication of CN114581738A publication Critical patent/CN114581738A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a behavior prediction network training method and system and a behavior anomaly detection method and system, and relates to the technical field of video anomaly detection. The training method comprises the following steps: constructing a heterogeneous twin network based on a convolution network and a U-Net network; acquiring a training video, wherein the training video comprises a plurality of time-continuous RGB video frames containing normal behaviors and optical flow frames; training the convolution network and the U-Net network through the RGB video frame and the optical flow frame respectively; determining an apparent loss function and a motion loss function; determining a multi-constraint loss function according to the apparent loss function and the motion loss function; and adjusting weights in the convolution network and the U-Net network according to the multi-constraint loss function to train the heterogeneous twin network to obtain the trained heterogeneous twin network. By the method and the device, the behavior prediction can be effectively carried out on the moving objects which move rapidly and have similar appearances in complex scenes, and further, the abnormal behaviors can be effectively detected.

Description

Behavior prediction network training method and system and behavior anomaly detection method and system
Technical Field
The invention relates to the technical field of video anomaly detection, in particular to a behavior prediction network training method and system and a behavior anomaly detection method and system.
Background
With the popularization of monitoring equipment and the wide attention of people to social public safety, video anomaly detection gradually becomes a research hotspot in the field of computer vision. Video anomaly detection aims to automatically detect and locate events deviating from expected behaviors in surveillance videos by utilizing computer vision technology and combining a machine learning method. However, video anomaly detection is a very challenging task, mainly expressed in the following aspects: (1) scarcity property: the number of normal samples in the real world is much larger than the number of abnormal samples, and the acquisition of abnormal samples is extremely expensive. (2) Ambiguity: there is no clear boundary between normal behavior and abnormal behavior. For example, skateboarders and pedestrians, although similar in appearance, are regarded as abnormal objects prohibited from appearing on sidewalks.
Most existing methods assume that all regions in the scene (including stationary background and moving foreground objects) have the same contribution. Unfortunately, this assumption may not be ideal because it can be found empirically that the primary element of anomaly detection is a moving object/person, not a stationary background. Most of the existing work uses "twin networks" to extract features from different information separately. For moving objects in a non-complex scene, the network can basically balance real-time performance and accuracy. However, for some moving objects with fast motion and similar appearance in complex scenes, the performance of the "twin network" may be reduced. Therefore, the feature extraction of moving objects with fast motion and similar appearance in complex scenes cannot meet the requirements.
Disclosure of Invention
The invention aims to provide a behavior prediction network training method and system and a behavior anomaly detection method and system. The method and the device can effectively predict the behavior of the moving object which moves rapidly and has similar appearance in a complex scene, and further can effectively detect the abnormal behavior.
In order to achieve the purpose, the invention provides the following scheme:
a behavior prediction network training method comprises the following steps:
constructing a heterogeneous twin network based on a convolution network and a U-Net network;
acquiring a training video, wherein the training video comprises a plurality of time-continuous RGB video frames containing normal behaviors and optical flow frames;
inputting RGB video frames continuous in any time into a convolution network of the heterogeneous twin network, and inputting optical flow frames continuous in any time into a U-Net network of the heterogeneous twin network;
determining an apparent loss function according to the output of the convolution network of the heterogeneous twin network and the RGB video frame at the next moment of the RGB video frame continuous in any time;
determining a motion loss function according to the output of the U-Net network of the heterogeneous twin network and the optical flow frame at the next moment of the optical flow frame continuous in any time;
determining a multi-constraint loss function according to the apparent loss function and the motion loss function;
and adjusting weights in the convolution network and the U-Net network according to the multi-constraint loss function so as to train the heterogeneous twin network and obtain the trained heterogeneous twin network.
The invention also provides a behavior abnormity detection method, which comprises the following steps:
acquiring a target video real frame, wherein the target video real frame comprises a plurality of RGB real video frames containing normal behaviors and optical flow real frames which are continuous in time;
inputting the real target video frame into a heterogeneous twin network to obtain a target video prediction frame, wherein the heterogeneous twin network is a network trained according to the behavior prediction network training method;
calculating the peak signal-to-noise ratio of the target video prediction frame and the target video real frame;
calculating a regularity score according to the peak signal-to-noise ratio, wherein the regularity score is used for judging the normal degree of the real frame of the target video;
judging whether the regularity score is lower than a preset threshold value or not;
if yes, abnormal behaviors exist in the real frame of the target video;
if not, the abnormal behavior does not exist in the real frame of the target video.
The invention also provides a behavior prediction network training system, which comprises:
the heterogeneous twin network construction unit is used for constructing a heterogeneous twin network based on the convolution network and the U-Net network;
the training video acquisition unit is used for acquiring a training video, and the training video comprises a plurality of time-continuous RGB video frames containing normal behaviors and optical flow frames;
the input unit is used for inputting RGB video frames continuous in any time into the convolution network of the heterogeneous twin network and inputting optical flow frames continuous in any time into the U-Net network of the heterogeneous twin network;
an apparent loss function determining unit, configured to determine an apparent loss function according to an output of the convolutional network of the heterogeneous twin network and an RGB video frame at a next time of the RGB video frame that is continuous in any time;
a motion loss function determining unit, configured to determine a motion loss function according to an output of the U-Net network of the heterogeneous twin network and an optical flow frame at a next time of the arbitrary time-continuous optical flow frame;
a multi-constraint loss function determination unit for determining a multi-constraint loss function from the apparent loss function and the motion loss function;
and the heterogeneous twin network training unit is used for adjusting the weights in the convolution network and the U-Net network according to the multi-constraint loss function so as to train the heterogeneous twin network and obtain the trained heterogeneous twin network.
The present invention also provides a behavior anomaly detection system, which includes:
the target video real frame acquisition unit is used for acquiring a target video real frame, and the target video real frame comprises a plurality of time-continuous RGB real video frames containing normal behaviors and an optical flow real frame;
a target video prediction frame obtaining unit, configured to input the target video real frame into a heterogeneous twin network to obtain a target video prediction frame, where the heterogeneous twin network is a network trained according to the behavior prediction network training method;
the peak signal-to-noise ratio calculation unit is used for calculating the peak signal-to-noise ratio of the target video prediction frame and the target video real frame;
the regularity score calculating unit is used for calculating a regularity score according to the peak signal-to-noise ratio, and the regularity score is used for judging the normal degree of the real frame of the target video;
the judging unit is used for judging whether the regularity score is lower than a preset threshold value or not;
if so, abnormal behaviors exist in the real frame of the target video;
if not, the abnormal behavior does not exist in the real frame of the target video.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention provides a behavior prediction network training method and system and a behavior anomaly detection method and system, wherein the behavior prediction network training method comprises the following steps: constructing an isomeric twin network based on a convolution network and a U-Net network; acquiring a training video, wherein the training video comprises a plurality of time-continuous RGB video frames containing normal behaviors and optical flow frames; inputting RGB video frames continuous in any time into a convolution network of the heterogeneous twin network, and inputting optical flow frames continuous in any time into a U-Net network of the heterogeneous twin network; determining an apparent loss function according to the output of the convolution network of the heterogeneous twin network and the RGB video frame at the next moment of the RGB video frame continuous in any time; determining a motion loss function according to the output of the U-Net network of the heterogeneous twin network and the optical flow frame at the next moment of the optical flow frame at any time; determining a multi-constraint loss function according to the apparent loss function and the motion loss function; and adjusting weights in the convolution network and the U-Net network according to the multi-constraint loss function so as to train the heterogeneous twin network and obtain the trained heterogeneous twin network. Compared with a twin network in the prior art, the heterogeneous twin network is composed of a convolution network and a U-Net network, the convolution network can be suitable for feature extraction of moving objects with similar appearances, the U-Net network can be suitable for feature extraction of moving objects with rapid movement, behavior prediction of the moving objects with rapid movement and similar appearances in complex scenes can be effectively achieved through the method, and abnormal behaviors can be effectively detected according to behavior prediction results.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a flowchart of a behavior prediction network training method according to embodiment 1 of the present invention;
FIG. 2 is a diagram of a heterogeneous twin network architecture;
FIG. 3 is an exemplary diagram of detection performance of a convolutional network and a U-Net network in a complex environment;
FIG. 4 is a schematic view of a "chain" manifold distribution;
FIG. 5 is a key-value module diagram;
fig. 6 is a flowchart of a behavior anomaly detection method according to embodiment 2 of the present invention;
FIG. 7 is a heterogeneous twin video anomaly detection framework based on key-value modules;
fig. 8 is a block diagram of a behavior prediction network training system according to embodiment 3 of the present invention;
fig. 9 is a block diagram of a behavior anomaly detection system according to embodiment 4 of the present invention;
FIG. 10 is a frame level ROC plot of different methods on the UCSD Ped2, CUHKAVAnue, and ShanghaiTech data sets;
FIG. 11 is a graph comparing EER for different methods;
FIG. 12 is a graph of the regularity scores of #002, #004 videos in the UCSD Ped2 dataset;
FIG. 13 is a graph of the regularity scores of #004, #015 videos in the Chukavenue dataset;
FIG. 14 is a graph of the regularity scores of #01_0029 and #03_0032 videos in the ShanghaiTech dataset;
FIG. 15 is a graph of AUC and EER for different abnormal behavior on the ShanghaiTech data set;
FIG. 16 is a graph of feature distribution in a key-value module visualized from different angles;
FIG. 17 is a visualization of t-SNE in MNIST (upper) and ShanghaiTech (lower) datasets;
FIG. 18 is a diagram of different anomalous behavior in a similar complex environment;
FIG. 19 is an analysis of the optimal placement of different loss coefficients on the Ped2 data set.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The invention aims to provide a behavior prediction network training method and system and a behavior anomaly detection method and system. The method and the device can effectively predict the behavior of the moving object which moves rapidly and has similar appearance in a complex scene, and further can effectively detect the abnormal behavior.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Example 1:
referring to fig. 1, the present invention provides a behavior prediction network training method, which includes the following steps:
a1: constructing a heterogeneous twin network based on a convolution network and a U-Net network; in the field of video anomaly detection, it is very important to determine anomalies in a surveillance video by making full use of complementary information of appearance and motion of a target object. In view of the insufficient detection capability of the twin network in a complex scene, the invention adopts a heterogeneous twin network to perform characteristic coding on appearance information such as the shape, the position and the like of a target object and motion information such as the speed and the like, wherein the network consists of two independent processing streams (appearance and motion), each stream uses a different coder and the input of the two streams is different, as shown in FIG. 2;
a2: acquiring a training video, wherein the training video comprises a plurality of time-continuous RGB video frames containing normal behaviors and optical flow frames;
a3: inputting RGB video frames continuous in any time into a convolution network of the heterogeneous twin network, and inputting optical flow frames continuous in any time into a U-Net network of the heterogeneous twin network;
a4: determining an apparent loss function according to the output of the convolution network of the heterogeneous twin network and the RGB video frame at the next moment of the RGB video frame continuous in any time;
a5: determining a motion loss function according to the output of the U-Net network of the heterogeneous twin network and the optical flow frame at the next moment of the optical flow frame continuous in any time;
a6: determining a multi-constraint loss function according to the apparent loss function and the motion loss function;
a7: and adjusting the weights in the convolution network and the U-Net network according to the multi-constraint loss function so as to train the heterogeneous twin network and obtain the trained heterogeneous twin network.
As shown in fig. 3, the performance of detecting abnormal motion is different for different networks. The heterogeneous twin network is composed of two sub-networks with different structures, and is more suitable for two situations with different input information. The invention effectively solves the limitation of feature extraction capability of a twin network in a complex scene, and makes the network more pertinent and adaptive.
After the step A3, before the step a4, the method further includes:
a8: converting and coding a plurality of time-continuous RGB video frames by adopting a convolutional network coder to obtain a plurality of first coding sequences;
a9: determining the weight of each first coding sequence to obtain a plurality of first addressing probabilities;
a10: determining the key-value pairs corresponding to the first coding sequences according to the first addressing probability corresponding to the first coding sequences to obtain the key-value pairs corresponding to the first coding sequences;
a11: determining the similarity between the key-value pairs corresponding to the first coding sequences, and merging the first coding sequences in the key-value pairs with the similarity larger than a first preset threshold value into the same key-value pair;
a12: and decoding the first coding sequence in each key-value pair by adopting a decoder of a convolutional network to obtain an RGB video prediction frame, wherein the RGB video prediction frame is the output of the convolutional network of the heterogeneous twin network.
When training a convolutional network, an encoder of the convolutional network detects whether an anomaly exists mainly by learning common apparent features of a static scene and a target object of interest, such as a truck, a bicycle and the like, and an input of the encoder of the convolutional network is a video sequence I with a length Tt-T,...,It-1The decoder of the convolutional network outputs the first predicted frame. Specifically, the encoder of the convolutional network converts the continuous video frames into a first coding sequence, the first coding sequence is stored in the key-value module to obtain a first potential vector, and the decoder of the convolutional network decodes the first potential vector to obtain a first predicted frame.
Figure BDA0003558001690000071
Figure BDA0003558001690000072
Wherein E isaEncoder for convolutional networks, DaIs a decoder of a convolutional network and,
Figure BDA0003558001690000073
and
Figure BDA0003558001690000074
parameters of the encoder and decoder of the convolutional network, respectively, zaIs the first coding sequence, zaIn retrieving features already stored in the key-value module,
Figure BDA0003558001690000075
for the latent vector, obtained using the retrieved features already stored in the key-value module, for the standard AE model, there are
Figure BDA0003558001690000076
Is the first predicted frame.
In order to force predicted frames in image space
Figure BDA0003558001690000077
Closer to its real frame, the present invention adds an apparent loss function as an appearance penalty, which guarantees the consistency of all pixels in the RGB space. The apparent loss function is shown in equation (3):
Figure BDA0003558001690000078
wherein laAs a function of apparent loss, ItFor a real frame of the RGB video,
Figure BDA00035580016900000710
predicting frames for RGB video, t is videoA frame sequence, T is the total length of the video frame sequence, | | · | | luminous2Is the euclidean distance.
After step A3, before step a5, further comprising:
a13: converting and coding a plurality of time continuous optical flow frames by adopting a coder of a U-Net network to obtain a plurality of second coding sequences;
a14: determining the weight of each second coding sequence to obtain a plurality of second addressing probabilities;
a15: determining the key-value pairs corresponding to the second coding sequences according to the second addressing probabilities corresponding to the second coding sequences to obtain the key-value pairs corresponding to the second coding sequences;
a16: determining the similarity between the key-value pairs corresponding to the second coding sequences, and merging the second coding sequences in the key-value pairs with the similarity larger than a second preset threshold value into the same key-value pair;
a17: and decoding the second coding sequence in each key-value pair by adopting a decoder of the U-Net network to obtain an optical flow prediction frame, wherein the optical flow prediction frame is the output of the U-Net network of the heterogeneous twin network.
In addition to the apparent characteristics of the target object, the motion state of a typical target is also very important for detecting anomalies in video. Unlike the convolutional network, the encoder of the U-Net network comprises jump connection between high and low layers with the same resolution, so that the combination of high-level semantic features and low-level detail features is realized, and the association between abnormal values and corresponding motions is studied. To better extract salient motion features, the present invention uses optical flow as a motion-related feature, which is sensitive to motion discontinuities.
Similarly, in training the U-Net network, the input of the encoder of the U-Net network is a video sequence with the length T
Figure BDA0003558001690000081
The output of the decoder of the U-Net network is a predicted frame
Figure BDA0003558001690000082
Specifically, the encoder of the U-Net network converts the continuous video frames into a second coding sequence, the second coding sequence is stored in the key-value module to obtain a second potential vector, and the decoder of the convolutional network decodes the second potential vector to obtain a second predicted frame.
Figure BDA0003558001690000083
Figure BDA0003558001690000084
Wherein E ismEncoder for U-Net network, DmIs a decoder of the U-Net network,
Figure BDA0003558001690000085
and
Figure BDA0003558001690000086
parameters of the encoder and decoder of the U-Net network, zmIs the second coding sequence, zaIn retrieving features already stored in the key-value module,
Figure BDA0003558001690000087
is a second potential vector, is obtained using the retrieved features that have been stored in the key-value module,
Figure BDA0003558001690000088
is the second predicted frame.
Noisy areas may be amplified due to the process of generating a smooth flow of light. Thus, the present invention employs L1Distance is lost to reduce its impact in learning motion information, denoted as a motion loss function. The motion loss function is shown in equation (6):
Figure BDA0003558001690000089
wherein lmIn order to be a function of the loss of motion,
Figure BDA00035580016900000811
for the real frame of the optical flow,
Figure BDA00035580016900000812
predicting frames for optical flow, | · | | non-conducting phosphor1Is L1Distance.
Because the clustering algorithm may be clustered into a chain in the high-dimensional space, different classes of features may be "covered" together in the high-dimensional to low-dimensional mapping process, affecting the determination of the distance between different features, as shown in fig. 4. Aiming at the limitation of the clustering discrimination algorithm, the invention introduces a key-value module for the first time, and the schematic diagram of the key-value module is shown in FIG. 5. Specifically, the appearance and motion characteristics extracted by the heterogeneous twin network are stored in a key-value module so that a new sample can be reasoned to judge whether the new sample has an abnormality or not in the testing stage. Meanwhile, the characteristics are updated through an updating mechanism of the key-value module, so that the characteristics of different types are stored in different key-value pairs, and chain clustering in a high-dimensional characteristic space is effectively avoided. Meanwhile, the prediction difference of the same target object in the normal sample in different context environments is relieved through the continuous updating of the key-value module.
As shown in fig. 5, the key-value module consists of three main components: key addressing, value reading and writing controllers.
In the Key-value module, Key-value pairs are defined as a vector Z and are generated by Key Hash, as shown in the following equation:
Figure BDA00035580016900000813
where N represents the maximum number of key-value pairs contained by the key-value module Z, kziKeys representing i-th key-value pairs, vziRepresenting the value of the ith key-value pair.
During key addressing, each candidate x is assigned a weight as its addressing probability to retrieve its associated item, where the candidate x represents each first encoded sequence and each second encoded sequence, and each addressing probability is defined as follows:
Figure BDA0003558001690000091
wherein, wiIn order to be the weight, the weight is,
Figure BDA0003558001690000092
to generate candidates x, phiK(zi) For the key in the generated i-th key-value pair, ΦK(zj) Is the key in the generated jth key-value pair.
In the value reading stage, the values of the key-value pairs are read by obtaining the weighted sum of the key-value pairs by the addressing probability and returning the output vector
Figure BDA0003558001690000093
The output vector is shown as:
Figure BDA0003558001690000094
wherein,
Figure BDA0003558001690000095
as an output vector, [ phi ]v(zi) Is the extracted feature value.
In order to fully embody the intra-class universality and the inter-class diversity except individual samples, the write controller is utilized to update the key-value pairs in the key-value module. The motivation is to store new similar features in the same key-value pair, and specifically, the similarity between features is calculated through the residual similarity. This allows more relevant information to be collected for subsequent access. The rule is to update the key with a new candidate x by integrating and normalizing the output vector with x, and the updated key is expressed as follows:
Figure BDA0003558001690000096
wherein,
Figure BDA0003558001690000097
for updated key, [ phi ]K(x) Is the key of the candidate x and,
Figure BDA0003558001690000098
is an output vector of k similar features.
The key-value pairs are then repeatedly updated. It is noted that if there is no available memory space, the key is updated by pressing equation (10). The key addressing equation is accordingly transformed to update the query, as shown in equation (11):
Figure BDA0003558001690000099
wherein,
Figure BDA00035580016900000910
in order to update the addressing probability,
Figure BDA00035580016900000911
is an updated candidate.
The appearance and motion characteristics extracted by the heterogeneous twin network are stored in the key-value module so as to reason the new sample in the testing stage to judge whether the new sample is abnormal or not. Meanwhile, the characteristics are updated through an updating mechanism of the key-value module, so that the characteristics of different types are stored in different key-value pairs, and chain clustering in a high-dimensional characteristic space is effectively avoided. Meanwhile, the prediction difference of the same target object in the normal sample in different context environments is relieved through the continuous updating of the key-value module.
As a possible implementation, the multi-constraint loss function of the heterogeneous twin network further comprises: the device comprises a characteristic compactness loss function and a characteristic separation loss function, wherein the characteristic compactness loss function is used for representing intra-class loss of the key-value pairs, and the characteristic separation loss function is used for representing inter-class loss of the key-value pairs and is punished by adopting an L2-norm.
Specifically, the calculation formula of the multi-constraint loss function is as follows:
ltotal=ηalamlmflf (12)
wherein eta isaIs a weight coefficient of apparent loss, ηmWeight coefficient, η, for motion lossfWeight coefficient for characteristic loss, ltotalFor a multi-constraint loss function,/aAs a function of apparent loss,/fIs a characteristic loss function.
lf=lc+ls (13)
Wherein lcFor a characteristic compactness loss function,/sA characteristic separation loss function.
Figure BDA0003558001690000102
Figure BDA0003558001690000103
Wherein,
Figure BDA0003558001690000104
for an updated key, n is the key index of the closest entry in the key-value module, znThe key in the key-value module that is closest to the query term, and N is the total number of key-value pairs.
Figure BDA0003558001690000105
Figure BDA0003558001690000106
Wherein z ismIs a keyThe key next to the query term in the value module, m is the key index of the next-to-next term in the key-value module, wiTo weight, α controls the confidence between key-value pairs.
Through the multi-constraint loss function, the learned normal sample characteristics have compactness and representativeness, and the optimal configuration of different loss parameters is analyzed on a video anomaly detection data set.
Example 2:
referring to fig. 6, the present invention provides a behavior anomaly detection method, including:
b1: acquiring a target video real frame, wherein the target video real frame comprises a plurality of RGB real video frames containing normal behaviors and optical flow real frames which are continuous in time;
b2: inputting the real target video frame into a heterogeneous twin network to obtain a target video prediction frame, wherein the heterogeneous twin network is a network trained according to the behavior prediction network training method in embodiment 1;
b3: calculating the peak signal-to-noise ratio of the target video prediction frame and the target video real frame;
b4: calculating a regularity score according to the peak signal-to-noise ratio, wherein the regularity score is used for judging the normal degree of the real frame of the target video;
b5: judging whether the regularity score is lower than a preset threshold value or not;
b6: if so, abnormal behaviors exist in the real frame of the target video;
b7: if not, abnormal behaviors do not exist in the real frame of the target video. A specific anomaly detection framework is shown in fig. 7.
In the testing phase, to evaluate the prediction quality from different data set images, the PSNR between the target video predicted frame and the target video real frame is calculated and measured with equation (18).
Figure BDA0003558001690000111
Wherein PSNR is the peak signal-to-noise ratio, LtIn order to obtain the real frame of the target video,
Figure BDA0003558001690000112
and predicting a frame for the target video, wherein l is a spatial index of the predicted frame for the target video, r is a spatial index of a real frame of the target video, and M is the number of pixels.
Specifically, the calculation formula of the regularity score is as follows:
Figure BDA0003558001690000113
wherein s (t) is the regularity score and PSNR is the peak signal-to-noise ratio LtIn order to obtain the real frame of the target video,
Figure BDA0003558001690000114
frames are predicted for the target video.
Example 3:
referring to fig. 8, the present invention provides a behavior prediction network training system, which includes:
the heterogeneous twin network construction unit 1 is used for constructing a heterogeneous twin network based on a convolution network and a U-Net network;
the training video acquisition unit 2 is used for acquiring a training video, and the training video comprises a plurality of RGB video frames and optical flow frames which are continuous in time and contain normal behaviors;
the input unit 3 is used for inputting RGB video frames continuous in any time into the convolution network of the heterogeneous twin network and inputting optical flow frames continuous in any time into the U-Net network of the heterogeneous twin network;
an apparent loss function determining unit 4, configured to determine an apparent loss function according to an output of the convolutional network of the heterogeneous twin network and an RGB video frame at a next time of the RGB video frame that is continuous in any time;
a motion loss function determining unit 5, configured to determine a motion loss function according to an output of the U-Net network of the heterogeneous twin network and an optical flow frame at a next time of the arbitrary time-continuous optical flow frame;
a multi-constraint loss function determination unit 6 for determining a multi-constraint loss function from the apparent loss function and the motion loss function;
and the heterogeneous twin network training unit 7 is used for adjusting the weights in the convolution network and the U-Net network according to the multi-constraint loss function so as to train the heterogeneous twin network and obtain the trained heterogeneous twin network.
Example 4:
referring to fig. 9, the present invention provides a behavior anomaly detection system, which includes:
a target video real frame acquiring unit 8, configured to acquire a target video real frame, where the target video real frame includes a plurality of RGB real video frames including normal behaviors and an optical flow real frame that are continuous in time;
a target video prediction frame obtaining unit 9, configured to input the target video real frame into a heterogeneous twin network to obtain a target video prediction frame, where the heterogeneous twin network is a network trained according to the behavior prediction network training method;
a peak signal-to-noise ratio calculation unit 10, configured to calculate a peak signal-to-noise ratio between the target video predicted frame and the target video real frame;
a regularity score calculating unit 11, configured to calculate a regularity score according to the peak signal-to-noise ratio, where the regularity score is used to determine a normal degree of the real frame of the target video;
a judging unit 12, configured to judge whether the regularity score is lower than a preset threshold;
if so, abnormal behaviors exist in the real frame of the target video;
if not, the abnormal behavior does not exist in the real frame of the target video.
Example 5:
to verify the advantages of the present invention, the behavioral anomaly detection method of the present invention is now compared with advanced algorithms:
in this embodiment, the behavioral anomaly detection method of the present invention is compared to different advanced methods, including classification-based methods, reconstruction-based methods, and prediction-based methods. The AUC results of the different methods are shown in table 1.
TABLE 1 AUC results on UCSD Ped2, CUHKAVANUE and ShanghaiTech datasets for different methods
Figure BDA0003558001690000131
As can be seen from table 1, the behavioral anomaly detection method of the present invention achieved better results on the baseline common data sets UCSD Ped2, CUHKAvenue, and ShanghaiTech than the advanced method. In the upper half, the accuracy of the results of the present invention on the UCSD Ped2 dataset was improved by at least 2.61% (94.10% vs 96.71%) compared to the classification-based approach. This is mainly because most classification methods use traditional handmade-based features, which have limited mining of representative features compared to deep learning methods. In the middle part, the method of the invention also performs best on three data sets compared to the reconstruction-based method. In particular, the performance of the algorithm of the present invention on the UCSD Ped2, CUHK Avenue and ShanghaiTech data sets was improved by 4.51%, 3.20% and 4.28% over SNRR-AE, respectively. In the lower half, the prediction task of the present invention achieves the best results on the CUHK Avenue and ShanghaiTech data sets, with average AUC reaching 86.70% and 73.88%, compared to methods based on future frame prediction. This demonstrates the effectiveness of the anomaly detection method of the present invention using a heterogeneous twin network based on key-value modules. The performance of the algorithm is respectively improved by 1.31%, 1.60% and 1.08% compared with the Frame-Prediction on three reference data sets. Although the Frame-Prediction also predicts future frames by adding optical flow, it uses the same U-Net network to extract appearance information and optical flow motion information, which may introduce some low-level information unrelated to appearance features to some extent. The main reason that the method of the invention is more effective is that the CAE model and the U-Net model are respectively adopted to independently code the appearance information and the motion information. The performance of AnoPCN on UCSD Ped2 dataset was slightly better than the method of the present invention (96.80% vs 96.71%), mainly because the proportion of apparent information (pedestrians, etc.) in UCSD Ped2 dataset was much higher than motion information (biking, etc.). The AnoPCN designs a deep neural network for generating frame prediction by using a predictive coding mechanism, and introduces an error refinement module to refine the coarse prediction, thereby being more beneficial to extracting apparent information. The method of the present invention combines both appearance and motion information, and therefore performs only marginally better on the UCSD Ped2 dataset. In contrast, in the case of CUHKAvenue and ShanghaiTech, which are relatively complex data sets with motion information and apparent information in a comparable ratio, the performance of the algorithm of the present invention is superior to that of AnoPCN, mainly because optical flow provides more discriminative motion cues.
FIG. 10 shows a visualization of a typical frame-level ROC curve of the method of the present invention with different methods on three datasets UCSD Ped2, CUHK Avenue and ShanghaiTech. It is clear from the frame-level evaluation that the method of the present invention is superior to other methods. In addition, the method of the present invention achieves a lower EER on three datasets, as shown in FIG. 11. EER is used to measure the error rate of the algorithm, and in detail, the smaller the EER, the lower the error rate of the algorithm. The method of the present invention obtains frame level EERs of 0.104, 0.207 and 0.327 on three data sets of UCSD Ped2, CUHKAVAnue and ShanghaiTech, respectively. It can be seen that the method of the present invention also gives better experimental results than others.
To qualitatively analyze the anomaly detection performance of the proposed model, the present invention visualizes the anomaly detection examples of the three reference data sets under the proposed framework, as shown in fig. 12, 13 and 14, respectively. The regularity score curve displays the abnormal scores of all the video frames in sequence, and can more intuitively reflect the performance of the proposed method. In each sub-graph, the regularity score represents the likelihood of normality, and the shaded portion in the video frame represents an anomaly in the real frame.
As can be seen from the left diagram of fig. 12, the method of the present invention can detect abnormalities well even in a crowded environment. It can be seen from the right diagram of fig. 12 that the regularity curve drops immediately when only one anomaly (car) occurs. When various anomalies (automobiles and bicycles) occur, the anomalies gradually descend to the lowest point. As shown in fig. 13, the regularity score is significantly lowered when an abnormality occurs, and is raised when the abnormality disappears. This indicates that the method of the present invention is capable of detecting the occurrence of an abnormality. The regularity score curve of fig. 13 may be found to be very coarse due to noise carried by the UCHK Avenue dataset itself. Figure 14 shows the most challenging abnormal behavior of biking, pushing, etc. in the ShanghaiTech dataset and results in a better regularity score, indicating that the method of the present invention is able to detect the occurrence of an abnormality.
To assess how the different components affect the anomaly detection performance of the proposed method of the present invention, ablation experiments were performed on the ShanghaiTech dataset and the results of the anomaly detection based on AUC are reported as shown in table 2. As can be seen from table 2, the performance of the prediction result after adding the motion constraint is much higher than that before adding no key-value module, from 67.60% to 70.70%, because the optical flow is more sensitive to fast moving objects such as running and cycling, and reflects the necessity of the optical flow as additional information to improve the abnormality detection performance. After the addition of the key-value pair module, ShanghaiTech predicted an increase of 3.18% (70.70% vs 73.88%) over that before the addition. On one hand, the key-value module converts the characteristics of the high-dimensional space into low-dimensional dynamic state for storage, thereby avoiding the influence caused by chain-shaped characteristics. On the other hand, due to the fact that the key-value module continuously updates the context information, strong similarity between basic components of the normal/abnormal samples and generalization capability of the neural network on abnormal behaviors are relieved, and prediction errors of the model are reduced.
Table 2 ablation test results on the ShanghaiTech data set, anomaly detection performance is reported in AUC (%) form
Figure BDA0003558001690000151
Figure BDA0003558001690000161
In addition, the present example also performed comparative experiments of the heterogeneous twin network and the twin network. Since the jump connection in the U-Net network may not be able to learn useful apparent information from the video frames, this embodiment employs two CAE networks with the same structure as the comparison experiment. As can be seen from Table 2, the AUC values for the method of the invention are about 1.4% higher than when a twin network is used. Meanwhile, the heterogeneous twin network has a larger contribution degree (70.05% vs 68.75%) to the extraction of the optical flow action characteristics than the twin network, which fully verifies the superiority of adopting the heterogeneous twin network.
Case analysis of ShanghaiTech dataset. Although the invention achieves advanced performance on test data sets, the identification capability of the invention in some specific abnormal behaviors by adopting the heterogeneous twin network is not enough. Therefore, the present embodiment has worked out the ShanghaiTech dataset, mainly because the ShanghaiTech dataset is the most challenging and realistic-looking dataset, with the most scenarios and abnormal behavior types. The present embodiment first classifies test videos of the ShanghaiTech data set into 15 classes. The video subsets were then separately VAD tested and the AUC and EER reported for each class as shown in fig. 15.
In order to verify that the method of the present invention can alleviate the "chain" clustering phenomenon existing in the high-dimensional space, the present embodiment performs visual analysis on the feature distribution in the key-value module from different angles on the ShanghaiTech data set. Fig. 16 (a) is a feature distribution of a few samples in the Ped2 dataset in the key-value model, which looks like a comparison of "hash". In order to more intuitively reflect that the key-value module can effectively solve the "chain-like" prevalence distribution phenomenon of the high-order space, the present embodiment visualizes it from different angles, as shown in (b) and (c) of fig. 16. As can be seen from (b) in fig. 16, the method of the present invention stores the extracted features in the key-value module, thereby well separating the features of different categories, avoiding direct contact between different features, and effectively solving the "chain-like" popular distribution phenomenon existing in video anomaly detection. Fig. 16 (c) is a two-dimensional map of fig. 16 (b).
From fig. 17, it can be seen that for a small data set such as MNIST, the T-SNR method has a better classification effect, and the clustering effect becomes better with the increase of the iteration cycle. But for those datasets (ShanghaiTech datasets, etc.) where the abnormal behavior is different but may exist in the same complex environment and the abnormal target is smaller in the whole graph, it is not suitable to use a similar T-SNR method. Meanwhile, it can be found that as the iteration period increases, a "chain" phenomenon occurs, because the environments in which many different classes of abnormal behaviors are located are similar and the significance ratio of the abnormal object is small, as shown in fig. 18. Therefore, there may be some drawbacks to using a clustering method to classify abnormal behavior. The invention only needs to store the characteristics of different classes and update the similar characteristics in time, thereby avoiding the phenomenon.
To explore different loss coefficients etaa,ηmAnd ηfThree sets of test experiments were performed for the optimal configuration of the proposed model. The relationship between the two loss coefficients was analyzed with the assurance that all other conditions remained unchanged. FIG. 19 shows AUC results on a UCSD Ped2 data set with parameter ranges of [0, 10%]Or [0,1 ]]。
FIG. 17 (a) shows the parameter ηaAnd ηmImpact on average frame level AUC. It can be found that when eta isa=2,ηmThe performance of the process of the invention is best when 1. However, when ηaWhen the value of (b) is greater than 2, the performance begins to decline, up to about 2 percentage points. When etamWhen the value of (a) is increased to 4, the performance is slightly degraded, and when eta is increased to 4mWhen the value of (A) is increased to 7, the performance thereof is remarkably deteriorated. FIG. 17 (b) shows the parameter ηaAnd ηfImpact on average frame level AUC. Can find thataPerformance of atfIt performs best when ═ 0.1, with ηaThe increase of the value is increased first and then decreased, at etaa2-position optimizationThe value is obtained. FIG. 17 (c) shows the parameter ηfAnd ηmThe relationship between them. It can be seen that when etafAUC is best when equal to 0.1, followed by ηmThe value of (c) will change constantly.
By the above phenomenon, it can be found that: the setting of different hyper-parameters has a significant impact on the performance of the network.
This example was performed on an NVIDIA TiTan RTX GPU. The average running time for video anomaly detection is about 20 fps. Run times for other methods are shown in table 3.
TABLE 3 average run time of different video anomaly detection methods
Figure BDA0003558001690000171
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principle and the embodiment of the present invention are explained by applying specific examples, and the above description of the embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (10)

1. A behavior prediction network training method is characterized by comprising the following steps:
constructing a heterogeneous twin network based on a convolution network and a U-Net network;
acquiring a training video, wherein the training video comprises a plurality of time-continuous RGB video frames containing normal behaviors and optical flow frames;
inputting RGB video frames continuous in any time into a convolution network of the heterogeneous twin network, and inputting optical flow frames continuous in any time into a U-Net network of the heterogeneous twin network;
determining an apparent loss function according to the output of the convolution network of the heterogeneous twin network and the RGB video frame at the next moment of the RGB video frame continuous in any time;
determining a motion loss function according to the output of the U-Net network of the heterogeneous twin network and the optical flow frame at the next moment of the optical flow frame at any time;
determining a multi-constraint loss function according to the apparent loss function and the motion loss function;
and adjusting weights in the convolution network and the U-Net network according to the multi-constraint loss function so as to train the heterogeneous twin network and obtain the trained heterogeneous twin network.
2. The behavior prediction network training method according to claim 1, further comprising, after the inputting of the arbitrary time-continuous RGB video frames into the convolution network of the heterogeneous twin network, before the determining the apparent loss function:
converting and coding a plurality of time-continuous RGB video frames by adopting a coder of a convolutional network to obtain a plurality of first coding sequences;
determining the weight of each first coding sequence to obtain a plurality of first addressing probabilities;
determining the key-value pairs corresponding to the first coding sequences according to the first addressing probability corresponding to the first coding sequences to obtain the key-value pairs corresponding to the first coding sequences;
determining the similarity between the key-value pairs corresponding to the first coding sequences, and merging the first coding sequences in the key-value pairs with the similarity larger than a first preset threshold value into the same key-value pair;
and decoding the first coding sequence in each key-value pair by adopting a decoder of a convolutional network to obtain the RGB video prediction frame.
3. The behavior prediction network training method according to claim 1, further comprising, after the inputting of the arbitrary time-continuous optical flow frames into the U-Net network of the heterogeneous twin network, before the determining the motion loss function:
converting and coding a plurality of time continuous optical flow frames by adopting a coder of a U-Net network to obtain a plurality of second coding sequences;
determining the weight of each second coding sequence to obtain a plurality of second addressing probabilities;
determining the key-value pairs corresponding to the second coding sequences according to the second addressing probabilities corresponding to the second coding sequences to obtain the key-value pairs corresponding to the second coding sequences;
determining the similarity between the key-value pairs corresponding to the second coding sequences, and merging the second coding sequences in the key-value pairs with the similarity larger than a second preset threshold value into the same key-value pair;
and decoding the second coding sequence in each key-value pair by adopting a decoder of the U-Net network to obtain the optical flow prediction frame.
4. A behavior prediction network training method according to claim 2 or 3, characterized in that the multi-constraint loss function of the heterogeneous twin network further comprises: a characteristic compactness loss function to represent intra-class losses for key-value pairs, and a characteristic separation loss function to represent inter-class losses for key-value pairs.
5. The behavior prediction network training method of claim 4, wherein the multi-constraint loss function is calculated as follows:
Figure FDA0003558001680000029
wherein etaaIs apparent lossWeight coefficient, ηmWeight coefficient, η, for motion lossfIn order to characterize the weight coefficients of the loss,
Figure FDA00035580016800000210
in order to be a multi-constraint loss function,
Figure FDA00035580016800000211
in order to be a function of the apparent loss,
Figure FDA0003558001680000021
Itfor a real frame of the RGB video,
Figure FDA0003558001680000022
predicting a frame for an RGB video, T being a sequence of video frames, T being a total length of the sequence of video frames, | · | | survival2Is the euclidean distance between the two nodes,
Figure FDA00035580016800000212
in order to be a function of the loss of motion,
Figure FDA0003558001680000023
Figure FDA0003558001680000024
for the real frame of the optical flow,
Figure FDA0003558001680000025
predicting frames for optical flow, | · | shading1Is L1The distance between the first and second electrodes,
Figure FDA00035580016800000213
in order to be a function of the characteristic loss,
Figure FDA00035580016800000214
Figure FDA00035580016800000215
in order to characterize the compactness loss function,
Figure FDA00035580016800000216
in order to characterize the separation loss function,
Figure FDA0003558001680000026
Figure FDA0003558001680000027
for an updated key, n is the key index of the closest entry in the key-value module, znThe key in the key-value module that is closest to the query term, N is the total number of key-value pairs,
Figure FDA0003558001680000028
zmfor the key next to the query term in the key-value module, m is the key index of the next-to-next term in the key-value module, wiAs a weight, α controls the confidence between key-value pairs.
6. A method for detecting behavioral anomalies, comprising:
acquiring a target video real frame, wherein the target video real frame comprises a plurality of RGB real video frames containing normal behaviors and optical flow real frames which are continuous in time;
inputting the real target video frame into a heterogeneous twin network to obtain a target video prediction frame, wherein the heterogeneous twin network is a network trained according to the behavior prediction network training method of claim 1;
calculating the peak signal-to-noise ratio of the target video prediction frame and the target video real frame;
calculating a regularity score according to the peak signal-to-noise ratio, wherein the regularity score is used for judging the normal degree of the real frame of the target video;
judging whether the regularity score is lower than a preset threshold value or not;
if so, abnormal behaviors exist in the real frame of the target video;
if not, the abnormal behavior does not exist in the real frame of the target video.
7. The behavioral anomaly detection method according to claim 6, characterized in that the peak signal-to-noise ratio is calculated as follows:
Figure FDA0003558001680000031
wherein PSNR is the peak signal-to-noise ratio, LtIn order to obtain the real frame of the target video,
Figure FDA0003558001680000032
and predicting a frame for the target video, wherein l is a spatial index of the predicted frame for the target video, r is a spatial index of a real frame of the target video, and M is the number of pixels.
8. The behavioral abnormality detection method according to claim 6, characterized in that the calculation formula of the regularity score is as follows:
Figure FDA0003558001680000033
wherein s (t) is the regularity score and PSNR is the peak signal-to-noise ratio LtIn order to obtain the real frame of the target video,
Figure FDA0003558001680000034
frames are predicted for the target video.
9. A behavior prediction network training system, comprising:
the heterogeneous twin network construction unit is used for constructing a heterogeneous twin network based on the convolution network and the U-Net network;
the training video acquisition unit is used for acquiring a training video, and the training video comprises a plurality of time-continuous RGB video frames containing normal behaviors and optical flow frames;
the input unit is used for inputting RGB video frames continuous in any time into the convolution network of the heterogeneous twin network and inputting optical flow frames continuous in any time into the U-Net network of the heterogeneous twin network;
an apparent loss function determining unit, configured to determine an apparent loss function according to an output of the convolutional network of the heterogeneous twin network and an RGB video frame at a next time of the RGB video frame that is continuous in any time;
a motion loss function determining unit, configured to determine a motion loss function according to an output of the U-Net network of the heterogeneous twin network and an optical flow frame at a next time of the optical flow frame at any time;
a multi-constraint loss function determination unit for determining a multi-constraint loss function from the apparent loss function and the motion loss function;
and the heterogeneous twin network training unit is used for adjusting the weights in the convolution network and the U-Net network according to the multi-constraint loss function so as to train the heterogeneous twin network and obtain the trained heterogeneous twin network.
10. A behavioral anomaly detection system, comprising:
the target video real frame acquisition unit is used for acquiring a target video real frame, and the target video real frame comprises a plurality of time-continuous RGB real video frames containing normal behaviors and an optical flow real frame;
a target video prediction frame obtaining unit, configured to input the target video real frame into a heterogeneous twin network to obtain a target video prediction frame, where the heterogeneous twin network is a network trained according to the behavior prediction network training method of claim 1;
the peak signal-to-noise ratio calculation unit is used for calculating the peak signal-to-noise ratio of the target video prediction frame and the target video real frame;
the regularity score calculating unit is used for calculating a regularity score according to the peak signal-to-noise ratio, and the regularity score is used for judging the normal degree of the real frame of the target video;
the judging unit is used for judging whether the regularity score is lower than a preset threshold value or not;
if so, abnormal behaviors exist in the real frame of the target video;
if not, the abnormal behavior does not exist in the real frame of the target video.
CN202210285382.1A 2022-03-22 2022-03-22 Behavior prediction network training method and system and behavior anomaly detection method and system Withdrawn CN114581738A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210285382.1A CN114581738A (en) 2022-03-22 2022-03-22 Behavior prediction network training method and system and behavior anomaly detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210285382.1A CN114581738A (en) 2022-03-22 2022-03-22 Behavior prediction network training method and system and behavior anomaly detection method and system

Publications (1)

Publication Number Publication Date
CN114581738A true CN114581738A (en) 2022-06-03

Family

ID=81777628

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210285382.1A Withdrawn CN114581738A (en) 2022-03-22 2022-03-22 Behavior prediction network training method and system and behavior anomaly detection method and system

Country Status (1)

Country Link
CN (1) CN114581738A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117409354A (en) * 2023-12-11 2024-01-16 山东建筑大学 Video anomaly detection method and system based on three paths of video streams and context awareness

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117409354A (en) * 2023-12-11 2024-01-16 山东建筑大学 Video anomaly detection method and system based on three paths of video streams and context awareness
CN117409354B (en) * 2023-12-11 2024-03-22 山东建筑大学 Video anomaly detection method and system based on three paths of video streams and context awareness

Similar Documents

Publication Publication Date Title
Lv et al. Localizing anomalies from weakly-labeled videos
Li et al. Spatio-temporal unity networking for video anomaly detection
CN112016500A (en) Group abnormal behavior identification method and system based on multi-scale time information fusion
CN107909027B (en) Rapid human body target detection method with shielding treatment
CN110287870A (en) Crowd's anomaly detection method based on comprehensive Optical-flow Feature descriptor and track
CN108549841A (en) A kind of recognition methods of the Falls Among Old People behavior based on deep learning
CN113449660B (en) Abnormal event detection method of space-time variation self-coding network based on self-attention enhancement
CN111738054B (en) Behavior anomaly detection method based on space-time self-encoder network and space-time CNN
CN113313037A (en) Method for detecting video abnormity of generation countermeasure network based on self-attention mechanism
Chen et al. Neuroaed: Towards efficient abnormal event detection in visual surveillance with neuromorphic vision sensor
CN113705490B (en) Anomaly detection method based on reconstruction and prediction
CN113158983A (en) Airport scene activity behavior recognition method based on infrared video sequence image
CN117437599B (en) Pedestrian abnormal event detection method and system for monitoring scene
CN115983087A (en) Method for detecting time sequence data abnormity by combining attention mechanism and LSTM and terminal
Ye et al. Abnormal event detection via feature expectation subgraph calibrating classification in video surveillance scenes
Ratre Taylor series based compressive approach and Firefly support vector neural network for tracking and anomaly detection in crowded videos
Kim et al. Video anomaly detection using cross u-net and cascade sliding window
Taghinezhad et al. A new unsupervised video anomaly detection using multi-scale feature memorization and multipath temporal information prediction
CN114581738A (en) Behavior prediction network training method and system and behavior anomaly detection method and system
Li et al. Multi-branch gan-based abnormal events detection via context learning in surveillance videos
CN113837306A (en) Abnormal behavior detection method based on human body key point space-time diagram model
CN113269111A (en) Elevator abnormal behavior detection method and system based on video monitoring
Hao et al. Human behavior analysis based on attention mechanism and LSTM neural network
CN113312968B (en) Real abnormality detection method in monitoring video
CN115410266A (en) Method for detecting eye movement event based on Transformer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20220603

WW01 Invention patent application withdrawn after publication