CN113255518B - Video abnormal event detection method and chip - Google Patents

Video abnormal event detection method and chip Download PDF

Info

Publication number
CN113255518B
CN113255518B CN202110568256.2A CN202110568256A CN113255518B CN 113255518 B CN113255518 B CN 113255518B CN 202110568256 A CN202110568256 A CN 202110568256A CN 113255518 B CN113255518 B CN 113255518B
Authority
CN
China
Prior art keywords
detected
abnormal
time sequence
video
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110568256.2A
Other languages
Chinese (zh)
Other versions
CN113255518A (en
Inventor
王嘉诚
张少仲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongcheng Hualong Computer Technology Co Ltd
Original Assignee
Shenwei Super Computing Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenwei Super Computing Beijing Technology Co ltd filed Critical Shenwei Super Computing Beijing Technology Co ltd
Priority to CN202110568256.2A priority Critical patent/CN113255518B/en
Publication of CN113255518A publication Critical patent/CN113255518A/en
Application granted granted Critical
Publication of CN113255518B publication Critical patent/CN113255518B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/44Event detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a video abnormal event detection method and a chip, wherein the method comprises the following steps: acquiring a video to be detected; obtaining at least two images to be detected which are arranged according to a time sequence by using a video to be detected; determining the spatial similarity between adjacent images to be detected to obtain the spatial variation characteristics of the video to be detected; the space change characteristics comprise a change rule of the space similarity according to a time sequence; determining the time sequence similarity between adjacent images to be detected to obtain the time sequence change characteristics of the video to be detected; the time sequence change characteristics comprise a change rule of the time sequence similarity according to a time sequence; fusing the spatial variation characteristic and the time sequence variation characteristic to obtain a fused variation characteristic; judging whether the fusion change characteristics are abnormal or not by using a pre-trained detection model; if yes, determining that an abnormal event exists in the video to be detected. The scheme can improve the identification efficiency of video abnormal events.

Description

Video abnormal event detection method and chip
Technical Field
The invention relates to the technical field of image processing, in particular to a video abnormal event method and a chip.
Background
With the continuous development of the internet of things technology, the monitoring equipment is widely deployed in all public areas, and hidden safety guarantee is provided for people. However, the conventional video monitoring mainly relies on artificial monitoring of abnormal events in scenes, which not only requires extremely high labor cost, but also is easily affected by subjective factors, and even causes some abnormal events to be not found in time, so that the identification efficiency of the abnormal events in the existing monitoring video is low.
In view of the above, it is desirable to provide a video abnormal event detection method and chip to solve the above-mentioned deficiencies.
Disclosure of Invention
The invention provides a video abnormal event detection method and a chip aiming at the defects in the prior art and solving the technical problem of how to improve the identification efficiency of video abnormal events.
In order to solve the above technical problem, in a first aspect, the present invention provides a method for detecting a video abnormal event, including:
acquiring a video to be detected;
obtaining at least two images to be detected which are arranged according to a time sequence by utilizing the video to be detected;
determining the spatial similarity between adjacent images to be detected to obtain the spatial variation characteristics of the video to be detected; wherein the spatial variation characteristics comprise a variation rule of the spatial similarity according to a time sequence;
determining the time sequence similarity between adjacent images to be detected to obtain the time sequence change characteristics of the video to be detected; the time sequence change characteristics comprise a change rule of the time sequence similarity according to a time sequence;
fusing the spatial variation characteristic and the time sequence variation characteristic to obtain a fused variation characteristic;
judging whether the fusion change characteristics are abnormal or not by using a pre-trained detection model;
and if so, determining that an abnormal event exists in the video to be detected.
Optionally, the determining the spatial similarity between adjacent images to be detected to obtain the spatial variation characteristic of the video to be detected includes:
performing spatial feature extraction on the at least two images to be detected to obtain spatial feature information corresponding to each image to be detected; the image to be detected comprises at least two space points;
for each spatial point of each image to be detected, executing:
determining first spatial feature information of the spatial point, and determining second spatial feature information corresponding to the spatial point in a first image to be detected adjacent to the image to be detected;
determining a similarity between the first spatial feature information and the second spatial feature information; the similarity is used for representing the similarity of the spatial point in the image to be detected and the spatial feature information in the first image to be detected;
determining the spatial similarity between the image to be detected and the first image to be detected according to the determined similarity of each spatial point;
arranging the spatial similarity between each image to be detected and a first image to be detected adjacent to the image to be detected according to a time sequence to generate spatial variation characteristics of the video to be detected; wherein, the time sequence is the arrangement sequence of the at least two images to be detected.
Optionally, the determining the time sequence similarity between adjacent images to be detected to obtain the time sequence change characteristics of the video to be detected includes:
aiming at each image to be detected, executing the following steps:
carrying out differential processing on the image to be detected and a first image to be detected adjacent to the image to be detected to obtain a differential image; the differential image is used for representing the difference between each pixel point between the image to be detected and the first image to be detected;
screening pixel points in the differential image to determine a motion group;
determining a target differential image corresponding to the motion group as the time sequence similarity between the adjacent images to be detected;
arranging the time sequence similarity between each image to be detected and a first image to be detected adjacent to the image to be detected according to a time sequence to generate a time sequence change characteristic of the video to be detected; wherein, the time sequence is the arrangement sequence of the at least two images to be detected.
Optionally, the method for creating the detection model includes:
acquiring at least two groups of historical abnormal-free videos; each group of historical abnormal-free videos comprises at least two images which are arranged according to a time sequence;
acquiring at least two groups of abnormal videos and acquiring fusion change characteristics of the at least two groups of abnormal videos;
extracting the characteristics of the at least two groups of historical abnormal-free videos to obtain fusion change characteristics corresponding to each group of historical abnormal-free videos; wherein the fusion change features are used for representing the spatial and temporal regularity of the historical abnormal-free video set arranged in time sequence;
clustering the fusion change characteristics of each group of historical abnormal-free videos by using an unsupervised clustering algorithm to obtain at least two clustering clusters;
combining the fusion change characteristics of the at least two groups of historical abnormal videos and the fusion change characteristics of the at least two groups of abnormal videos to obtain a training set;
and training by using the training set according to the at least two clustering clusters and the multi-classification support vector machine to obtain the detection model.
Optionally, the obtaining at least two groups of abnormal videos and obtaining the fusion change characteristics of the at least two groups of abnormal videos includes:
aiming at each group of historical abnormal-free videos, randomly disturbing at least two images in the group of historical abnormal-free videos to obtain corresponding abnormal videos; the abnormal video comprises at least two images which are not arranged in time sequence;
extracting the characteristics of at least two groups of abnormal videos to obtain the spatial variation characteristics and the time sequence variation characteristics of each group of historical abnormal-free videos;
fusing the spatial variation characteristics and the time sequence variation characteristics of each group of abnormal videos to obtain fusion variation characteristics corresponding to the group of abnormal videos; the fusion change characteristics are used for representing the space and time sequence rules of at least two images which are not arranged in time sequence.
Optionally, the fusion change feature of each group of abnormal videos includes a target spatial change feature and a target time sequence change feature;
the fusing the spatial variation characteristic and the time sequence variation characteristic to obtain a fused variation characteristic, including:
for each set of abnormal videos, performing:
carrying out nonlinear fusion on the target space variation characteristic and the target time sequence variation characteristic, determining a first weight value, and outputting the score of each image in the abnormal video group through an activation function;
carrying out weight distribution on each image in the abnormal videos by using a softmax function to obtain abnormal scores of the abnormal videos;
judging whether the abnormal scores corresponding to each group of abnormal videos are all larger than a preset threshold value;
if so, determining that the first weight value is a target weight value, and fusing the spatial variation characteristic and the time sequence variation characteristic by using the target weight value to obtain a fusion variation characteristic.
Optionally, the calculation formula of the anomaly score of each abnormal video group is as follows:
Figure 100002_DEST_PATH_IMAGE001
wherein the content of the first and second substances,Dan anomaly score for characterizing the set of anomalous videos;nthe method is used for representing the number of images included in the abnormal video;φfor characterizing an activation function;za fusion change feature for characterizing the set of outlier videos;w 1b 1parameters to be learned for characterizing the first attention network layer;w 2b 2the parameter to be learned is used for representing the second attention network layer;g ia score for characterizing an ith image in the set of outlier videos;
Figure DEST_PATH_IMAGE002
and the attention weight value is used for representing the ith image in the abnormal video.
Optionally, after determining that there is an abnormal event in the video to be detected, the method further includes:
analyzing the time sequence change characteristics to determine the time of the abnormal event;
acquiring a target image to be detected corresponding to the determined time;
determining the target space similarity between the target image to be detected and the adjacent image to be detected;
and determining the spatial information of the abnormal event according to the target spatial similarity.
In a second aspect, the present invention further provides a video abnormal event detection chip, including:
the acquisition module is used for acquiring a video to be detected and acquiring at least two images to be detected which are arranged according to a time sequence by using the video to be detected;
the first characteristic determining module is used for determining the spatial similarity between adjacent images to be detected to obtain the spatial variation characteristics of the video to be detected; wherein the spatial variation characteristics comprise a variation rule of the spatial similarity according to a time sequence;
the second characteristic determining module is used for determining the time sequence similarity between adjacent images to be detected to obtain the time sequence change characteristics of the video to be detected; the time sequence change characteristics comprise a change rule of the time sequence similarity according to a time sequence;
the fusion module is used for fusing the spatial variation characteristic obtained by the first characteristic determining module and the time sequence variation characteristic obtained by the second characteristic determining module to obtain a fusion variation characteristic;
and the detection module is used for judging whether the fusion change characteristics obtained by the fusion module are abnormal or not by utilizing a pre-trained detection model, and if so, determining that an abnormal event exists in the video to be detected.
In a third aspect, the present invention further provides a video abnormal event detection apparatus, including: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor is configured to invoke the machine readable program to execute the video abnormal event detection method provided by the first aspect or any possible implementation manner of the first aspect.
The embodiment of the invention provides a method and a chip for detecting video abnormal events, wherein the method comprises the following steps: the method comprises the steps of obtaining at least two images to be detected arranged according to a time sequence by utilizing an obtained video to be detected, determining the spatial similarity and the time sequence similarity between adjacent images to be detected so as to respectively obtain the spatial variation characteristic and the time sequence variation characteristic of the video to be detected, fusing the obtained spatial variation characteristic and the time sequence variation characteristic, and determining that an abnormal event exists in the video to be detected when the fused variation characteristic is judged to be abnormal by utilizing a pre-trained detection model. Therefore, the video to be detected is processed, the spatial change characteristic and the time sequence change characteristic of the video to be detected are fused, the internal relation between time and space is utilized, accurate detection and identification of the segment abnormal events are facilitated, more accurate detection and positioning of the boundary of the abnormal events can be realized, manual monitoring is not needed, and the video abnormal event identification efficiency is further improved.
Drawings
Fig. 1 is a method for detecting video abnormal events according to an embodiment of the present invention;
FIG. 2 is a block diagram of another video exception detection method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a video abnormal event detection chip according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a device where a video abnormal event detection apparatus according to an embodiment of the present invention is located.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
As described in the background art, in the prior art, conventional video monitoring mainly relies on manual monitoring and discrimination, and cannot cope with the current trend of data volume explosion. The existing abnormal event detection method based on weak supervised learning only considers the characteristics of the video characteristic segment and does not connect the video characteristic segment with other video characteristic segments, so that the detection significance is poor and the identification efficiency is low.
In view of this, the relation among the segments of the video can be considered to better represent the dynamic characteristics of the video, and the spatiotemporal association relationship in the video is fused to realize the fast and effective detection and positioning of the abnormal event.
The foregoing is the concept provided by the present invention, and specific implementations of the concept provided by the present invention are described below.
As shown in fig. 1, a method for detecting a video abnormal event according to an embodiment of the present invention includes the following steps:
step 101: acquiring a video to be detected;
step 102: obtaining at least two images to be detected which are arranged according to a time sequence by using a video to be detected;
step 103: determining the spatial similarity between adjacent images to be detected to obtain the spatial variation characteristics of the video to be detected; the space change characteristics comprise a change rule of the space similarity according to a time sequence;
step 104: determining the time sequence similarity between adjacent images to be detected to obtain the time sequence change characteristics of the video to be detected; the time sequence change characteristics comprise a change rule of the time sequence similarity according to a time sequence;
step 105: fusing the spatial variation characteristic and the time sequence variation characteristic to obtain a fused variation characteristic;
step 106: judging whether the fusion change characteristics are abnormal or not by using a pre-trained detection model;
step 107: if yes, determining that an abnormal event exists in the video to be detected.
In the embodiment of the invention, at least two images to be detected which are arranged according to a time sequence are obtained by using the obtained video to be detected, the spatial similarity and the time sequence similarity between adjacent images to be detected are determined so as to respectively obtain the spatial variation characteristic and the time sequence variation characteristic of the video to be detected, the obtained spatial variation characteristic and the time sequence variation characteristic are fused, and when the fused variation characteristic is judged to be abnormal by using a pre-trained detection model, the abnormal event in the video to be detected is determined. Therefore, the video to be detected is processed, the spatial change characteristic and the time sequence change characteristic of the video to be detected are fused, the internal relation between time and space is utilized, accurate detection and identification of the segment abnormal events are facilitated, more accurate detection and positioning of the boundary of the abnormal events can be realized, manual monitoring is not needed, and the video abnormal event identification efficiency is further improved.
It should be noted that, in step 101, the captured video of a certain area may be acquired in real time by various video monitoring devices. In step 102, a video to be detected is subjected to framing processing to obtain at least two images to be detected arranged in a time sequence.
In this embodiment of the present invention, the detection model in step 106 may be obtained by training a training set, where the training set includes at least two sets of fusion change features of videos, and each set of videos includes a fusion change feature as an input and an abnormal event detection result of the set of videos as an output.
Optionally, in the method for detecting a video abnormal event shown in fig. 1, the step 103 of determining spatial similarity between adjacent images to be detected to obtain a spatial variation characteristic of the video to be detected includes:
performing spatial feature extraction on at least two images to be detected to obtain spatial feature information corresponding to each image to be detected; the image to be detected comprises at least two space points;
for each spatial point of each image to be detected, executing:
determining first spatial feature information of the spatial point, and determining second spatial feature information corresponding to the spatial point in a first image to be detected adjacent to the image to be detected;
determining similarity between the first spatial feature information and the second spatial feature information; the similarity is used for representing the similarity of the spatial point in the image to be detected and the spatial feature information in the first image to be detected;
determining the spatial similarity between the image to be detected and the first image to be detected according to the determined similarity of each spatial point;
arranging the spatial similarity between each image to be detected and a first image to be detected adjacent to the image to be detected according to a time sequence to generate spatial variation characteristics of the video to be detected; wherein the time sequence is the arrangement sequence of at least two images to be detected.
It should be noted that the spatial point may be a spatial position coordinate in the image to be detected, and the first spatial feature information of each spatial point constitutes spatial feature information of the image to be detected; wherein, the first image to be detected adjacent to the image to be detected is the next image to be detected in the image to be detected which is arranged according to the time sequence.
In the embodiment of the invention, for each spatial point of each image to be detected, first spatial feature information of the spatial point is firstly determined, second spatial feature information of the spatial point in a first image to be detected adjacent to the detected image is then determined, and the similarity of the same spatial point in the adjacent images to be detected is determined by comparing the first spatial feature information with the second spatial feature information. Similarly, the spatial similarity between each spatial point in the image to be detected and the adjacent first image to be detected can be obtained, and finally the spatial similarity of each image to be detected is arranged according to the arrangement sequence of at least two images to be detected, so as to generate the spatial variation characteristic of the video to be detected.
Because videos have spatial similarity, in the embodiment of the present invention, a spatial similarity change rule corresponding to a video to be detected, that is, a change rule of spatial feature information in the video to be detected, is obtained by comparing spatial feature information of adjacent images to be detected. Therefore, the information of the spatial dimension is added in the video abnormal event detection for detection, the accuracy of abnormal event detection is increased, the generalization capability of abnormal event detection in a complex environment is improved, meanwhile, the spatial information of the subsequent abnormal event can be determined based on the spatial characteristic information, and more accurate abnormal event boundary detection and positioning are realized.
Optionally, in the method for detecting a video abnormal event shown in fig. 1, the step 104 of determining a time sequence similarity between adjacent images to be detected to obtain a time sequence variation characteristic of the video to be detected includes:
aiming at each image to be detected, executing the following steps:
carrying out differential processing on the image to be detected and a first image to be detected adjacent to the image to be detected to obtain a differential image; the differential image is used for representing the difference between each pixel point between the image to be detected and the first image to be detected;
screening pixel points in the difference image to determine a motion group;
determining a target differential image corresponding to the motion group as the time sequence similarity between adjacent images to be detected;
arranging the time sequence similarity between each image to be detected and a first image to be detected adjacent to the image to be detected according to a time sequence to generate a time sequence change characteristic of the video to be detected; wherein the time sequence is the arrangement sequence of at least two images to be detected.
In the embodiment of the invention, for each image to be detected, firstly, the image to be detected and a first image to be detected adjacent to the image to be detected are subjected to differential processing to obtain a differential image, then, pixel points in the differential image are screened, (namely, the difference between corresponding pixel points in a current image and a background image is detected, if the difference is greater than a certain threshold value, the pixel is judged to be a foreground moving object), a moving group is determined, the differential image corresponding to the moving group is determined to be a target differential image (namely, the difference between the corresponding pixel points in the differential image corresponding to the reserved moving group is kept, and the difference information of the pixel points of other non-moving groups is returned to zero), and the target differential image is used as the time sequence similarity between the adjacent images to be detected. Therefore, the video abnormal event is mainly used for detecting the motion group, and the time sequence similarity only retains the time sequence similarity information of the corresponding motion group, so that the video abnormal event to be detected can be quickly acquired and detected.
The occurrence of abnormal events in the video has a relationship in time sequence relation besides the similarity on the spatial feature information, and the time sequence relation is a more important video feature. Often, an abnormal event does not have the smooth timing characteristics of a normal event. In the embodiment of the invention, the time sequence similarity of each image to be detected is finally arranged according to the arrangement sequence of at least two images to be detected, and the time sequence change characteristic of the video to be detected is generated. Therefore, based on the time sequence change characteristics of the motion group in the video to be detected, the abnormal event in the video to be detected can be preliminarily determined without relying on manual monitoring.
Then, in order to further improve accurate detection and identification of the abnormal events in the video, the abnormal events in the video are completely expressed, and detection robustness is improved. In step 105, the spatial variation characteristic obtained in step 103 and the temporal variation characteristic obtained in step 104 are fused.
Optionally, in a video abnormal event detection method shown in fig. 1, the method for creating a detection model includes:
acquiring at least two groups of historical abnormal-free videos; each group of historical abnormal-free videos comprises at least two images which are arranged according to a time sequence;
acquiring at least two groups of abnormal videos and acquiring fusion change characteristics of the at least two groups of abnormal videos;
extracting the characteristics of at least two groups of historical abnormal-free videos to obtain fusion change characteristics corresponding to each group of historical abnormal-free videos; the fusion change characteristics are used for representing the space and time sequence rules of the historical abnormal-free videos arranged according to the time sequence;
clustering the fusion change characteristics of each group of historical abnormal-free videos by using an unsupervised clustering algorithm to obtain at least two clustering clusters;
combining the fusion change characteristics of at least two groups of historical abnormal-free videos and the fusion change characteristics of at least two groups of abnormal videos to obtain a training set;
and training by using the training set according to the at least two clustering clusters and the multi-classification support vector machine to obtain a detection model.
In this embodiment of the present invention, the method for creating the detection model in step 106 includes: acquiring at least two groups of historical abnormal-free videos, wherein each group of historical abnormal-free videos comprises at least two images arranged according to a time sequence, acquiring spatial variation characteristics and time sequence variation characteristics of each group of historical abnormal-free videos according to the method in the steps 103 and 104, fusing the spatial variation characteristics and the time sequence variation characteristics corresponding to each group of historical abnormal-free videos in the step 105, clustering the obtained fusion variation characteristics by using an unsupervised clustering algorithm (such as a K-means clustering algorithm) to obtain at least two clustering clusters representing normal motion behaviors, acquiring at least two groups of abnormal videos, combining the fusion variation characteristics of at least two groups of historical abnormal-free videos and the fusion variation characteristics of at least two groups of abnormal videos to obtain a training set, training at least two multi-classification support vector machines by using the training set according to the at least two clustering clusters, obtaining a detection model comprising at least two multi-classification support vector machines; the number of the multi-classification support vector machines and the number of the clustering clusters are the same.
In the embodiment of the present invention, step 106 uses the pre-trained detection model including at least two multi-class support vector machines to input the fusion change characteristics of the video to be detected obtained in step 105 into the detection model, so as to obtain the detection result of the video to be detected output by the detection model. When at least one multi-classification support vector machine in the detection model judges that the fusion change characteristics are abnormal, the abnormal event exists in the video to be detected. Therefore, the detection model can convert the abnormal detection problem into the classification problem, and when the abnormal video does not belong to any normal video, the abnormal event can be detected more quickly and effectively, so that the detection and identification efficiency of the video abnormal event is improved.
It should be noted that, in the embodiment of the present invention, the number of cluster clusters may be preset according to environments where different types of videos are located.
Optionally, in the method for detecting a video abnormal event shown in fig. 1, acquiring at least two groups of abnormal videos and obtaining a fusion change feature of the at least two groups of abnormal videos includes:
aiming at each group of historical abnormal-free videos, randomly disturbing at least two images in the group of historical abnormal-free videos to obtain corresponding abnormal videos; the abnormal video comprises at least two images which are not arranged in time sequence;
extracting the characteristics of at least two groups of abnormal videos to obtain the spatial variation characteristics and the time sequence variation characteristics of each group of historical abnormal-free videos;
fusing the spatial variation characteristics and the time sequence variation characteristics of each group of abnormal videos to obtain fusion variation characteristics corresponding to the group of abnormal videos; the fusion change characteristics are used for representing the space and time sequence rules of at least two images which are not arranged in time sequence.
Abnormal events are usually not defined clearly, so that certain difficulty is brought to abnormal event detection, abnormal event samples are not easy to obtain, and usually, only abnormal videos can be collected manually to obtain enough abnormal samples for training a detection model, so that the cost is too high. However, the abnormal events do not have the stable time sequence characteristic of the normal events, so that the abnormal videos are obtained by breaking the stable time sequence characteristic, and the cost of collecting abnormal data is reduced. For example, when the sports group a walks normally for 6min and is suddenly knocked down by surrounding people, that is, the action of a changes suddenly, at this time, the time sequence relationship of the collocation rule cannot be captured, and the time sequence relationship of the sports has certain confusion, that is, it is determined that an abnormal event is detected.
Specifically, the embodiment of the present invention provides a method for acquiring an abnormal video, which includes, first, for each group of historical abnormal-free videos, randomly disturbing at least two images arranged in time sequence in the group of historical abnormal-free videos, that is, disturbing timing information of normal samples, and destroying spatial similarity and temporal similarity between adjacent images, so as to obtain an abnormal video corresponding to the group of historical abnormal-free videos. Therefore, as for the historical abnormal-free videos, abnormal video samples corresponding to the clustering clusters of different normal motions can be obtained by a random disturbance method, and balanced training samples are provided for each multi-classification support vector machine, so that the cost for collecting the abnormal videos is further reduced.
In the embodiment of the present invention, the spatial variation characteristic and the temporal variation characteristic of each group of abnormal videos are obtained according to the methods described in step 103 and step 104, and the spatial variation characteristic and the temporal variation characteristic corresponding to each group of abnormal videos are fused in step 105 to obtain a fused variation characteristic corresponding to each group of abnormal videos.
Optionally, in the video abnormal event detection method shown in fig. 1, the fusion change feature of each group of abnormal videos includes a target spatial change feature and a target temporal change feature;
step 105, fusing the spatial variation characteristic and the time sequence variation characteristic to obtain a fusion variation characteristic, which comprises the following steps:
for each set of abnormal videos, performing:
carrying out nonlinear fusion on the target space change characteristic and the target time sequence change characteristic, determining a first weight value, and outputting the score of each image in the abnormal video group through an activation function;
carrying out weight distribution on each image in the abnormal videos by using a softmax function to obtain abnormal scores of the abnormal videos;
judging whether the abnormal scores corresponding to each group of abnormal videos are all larger than a preset threshold value;
if so, determining that the first weight value is a target weight value, and fusing the space variation characteristic and the time sequence variation characteristic by using the target weight value to obtain a fusion variation characteristic.
In order to better utilize the spatial similarity and the time sequence similarity, in the embodiment of the invention, a self-adaptive space-time fusion graph network is constructed to learn the space-time characteristics of the abnormal events, and the space change characteristics and the time sequence change characteristics are fused by adopting a self-adaptive weight fusion method to obtain the fusion change characteristics.
Specifically, the weight value of each feature during fusion is determined by using the abnormal video. Firstly, aiming at each group of abnormal videos, carrying out nonlinear fusion on target space change characteristics and target time sequence change characteristics included in fusion change characteristics of the group of abnormal videos, determining a first weight value, outputting a score of each image in the group of abnormal videos through an activation function, carrying out weight distribution on each image in the group of abnormal videos based on an attention mechanism, finally obtaining abnormal scores corresponding to the group of abnormal videos, similarly, obtaining abnormal scores corresponding to the groups of abnormal videos, and when the abnormal scores corresponding to the groups of abnormal videos are all larger than a preset threshold value, determining the first weight value as the target weight value; the target weight value comprises respective weights of the spatial variation characteristic and the time sequence variation characteristic in fusion. Therefore, the weight values of the spatial variation characteristic and the time sequence variation characteristic suitable for the current abnormal video can be determined, the spatial similarity and the time sequence similarity are comprehensively considered, and the detection accuracy of the abnormal event is further improved.
Alternatively, in a video abnormal event detection method shown in fig. 1, the calculation formula of the abnormal score of each abnormal video group is as follows:
Figure DEST_PATH_IMAGE003
wherein the content of the first and second substances,Dan anomaly score for characterizing the set of anomalous videos;nthe method is used for representing the number of images included in the abnormal video;φfor characterizing an activation function;za fusion change feature for characterizing the set of outlier videos;w 1b 1parameters to be learned for characterizing the first attention network layer;w 2b 2the parameter to be learned is used for representing the second attention network layer;g ia score for characterizing an ith image in the set of outlier videos;
Figure DEST_PATH_IMAGE004
and the attention weight value is used for representing the ith image in the abnormal video.
In the embodiment of the present invention, in order to determine the respective weight values of the spatial variation characteristic and the temporal variation characteristic in the fusion process, an optimal target weight value is finally determined by the above-described manner of calculating the abnormality score of the abnormal video. Specifically, when the first weight value is determined, the score of each image in the abnormal video in the group may be output through an activation function (e.g., sigmoid function), and in order to further highlight the abnormal region and reduce the over-smooth influence caused by fusion, weight distribution may be implemented on each image through two layers of attention network layers and a softmax function, so as to obtain the abnormal score corresponding to the abnormal video.
Optionally, in the method for detecting video abnormal events shown in fig. 1, after determining that there is an abnormal event in the video to be detected, the method further includes:
analyzing the time sequence change characteristics to determine the time of the abnormal event;
acquiring a target image to be detected corresponding to the determined time;
determining the target space similarity between the target image to be detected and the adjacent image to be detected;
and determining the spatial information of the abnormal event according to the target spatial similarity.
In the embodiment of the present invention, after determining that an abnormal event exists in the video to be detected in step 107, the time sequence change characteristics of the video to be detected are analyzed to determine the time when the abnormal event occurs, and at the same time, a target image to be detected corresponding to the determined time is acquired, and the target spatial similarity between the target image to be detected and an adjacent image to be detected is determined, so as to determine the spatial information of the abnormal event according to the target spatial similarity, thereby realizing the detection and positioning of the boundary of the abnormal event.
In order to more clearly illustrate the technical solution and advantages of the present invention, as shown in fig. 2, the following describes in detail a video abnormal event detection method provided by an embodiment of the present invention, which specifically includes:
step 201: and constructing a detection model.
In particular, the amount of the solvent to be used,
a1, acquiring at least two groups of historical abnormal-free videos; each group of historical abnormal-free videos comprises at least two images which are arranged according to a time sequence;
a2, acquiring at least two groups of abnormal videos, and obtaining fusion change characteristics of the at least two groups of abnormal videos, wherein the fusion change characteristics comprise:
aiming at each group of historical abnormal-free videos, randomly disturbing at least two images in the group of historical abnormal-free videos to obtain corresponding abnormal videos; the abnormal video comprises at least two images which are not arranged in time sequence;
extracting the characteristics of at least two groups of abnormal videos to obtain the spatial variation characteristics and the time sequence variation characteristics of each group of historical abnormal-free videos;
fusing the spatial variation characteristics and the time sequence variation characteristics of each group of abnormal videos to obtain fusion variation characteristics corresponding to the group of abnormal videos; the fusion change characteristics are used for representing the space and time sequence rules of at least two images which are not arranged according to the time sequence;
a3, extracting the characteristics of at least two groups of historical abnormal-free videos to obtain the fusion change characteristics corresponding to each group of historical abnormal-free videos; the fusion change characteristics are used for representing the space and time sequence rules of the historical abnormal-free videos arranged according to the time sequence;
a4, clustering fusion change characteristics of each group of historical abnormal-free videos by using an unsupervised clustering algorithm to obtain at least two clustering clusters;
a5, combining the fusion change characteristics of at least two groups of historical abnormal videos and the fusion change characteristics of at least two groups of abnormal videos to obtain a training set;
and A6, training by using a training set according to at least two clustering clusters and a multi-classification support vector machine to obtain a detection model.
More specifically, the fusion change features in the above steps are obtained according to the methods in step 203, step 204, and step 205.
Step 202: and obtaining at least two images to be detected which are arranged according to the time sequence by utilizing the video to be detected.
Specifically, a video to be detected is obtained, and the video to be detected is subjected to framing processing to obtain at least two images to be detected which are arranged according to a time sequence.
Step 203: and determining the spatial variation characteristics of the video to be detected.
Specifically, spatial feature extraction is carried out on at least two images to be detected, and spatial feature information corresponding to each image to be detected is obtained; the image to be detected comprises at least two space points;
for each spatial point of each image to be detected, executing:
determining first spatial feature information of the spatial point, and determining second spatial feature information corresponding to the spatial point in a first image to be detected adjacent to the image to be detected;
determining similarity between the first spatial feature information and the second spatial feature information; the similarity is used for representing the similarity of the spatial point in the image to be detected and the spatial feature information in the first image to be detected;
determining the spatial similarity between the image to be detected and the first image to be detected according to the determined similarity of each spatial point;
arranging the spatial similarity between each image to be detected and a first image to be detected adjacent to the image to be detected according to a time sequence to generate spatial variation characteristics of the video to be detected; wherein, the time sequence is the arrangement sequence of at least two images to be detected; the spatial variation characteristics comprise a variation rule of the spatial similarity according to time sequence.
Step 204: and determining the time sequence change characteristics of the video to be detected.
Specifically, determining the time sequence similarity between adjacent images to be detected to obtain the time sequence change characteristics of the video to be detected, including:
aiming at each image to be detected, executing the following steps:
carrying out differential processing on the image to be detected and a first image to be detected adjacent to the image to be detected to obtain a differential image; the differential image is used for representing the difference between each pixel point between the image to be detected and the first image to be detected;
screening pixel points in the difference image to determine a motion group;
determining a target differential image corresponding to the motion group as the time sequence similarity between adjacent images to be detected;
arranging the time sequence similarity between each image to be detected and a first image to be detected adjacent to the image to be detected according to a time sequence to generate a time sequence change characteristic of the video to be detected; wherein, the time sequence is the arrangement sequence of at least two images to be detected; the time sequence change characteristics comprise a change rule of the time sequence similarity according to the time sequence.
Step 205: and fusing the space variation characteristic and the time sequence variation characteristic to obtain a fusion variation characteristic.
Specifically, for each group of abnormal videos, the following steps are performed:
carrying out nonlinear fusion on the target space change characteristic and the target time sequence change characteristic, determining a first weight value, and outputting the score of each image in the abnormal video group through an activation function;
carrying out weight distribution on each image in the abnormal videos by using a softmax function to obtain abnormal scores of the abnormal videos;
judging whether the abnormal scores corresponding to each group of abnormal videos are all larger than a preset threshold value;
if so, determining that the first weight value is a target weight value, and fusing the space variation characteristic and the time sequence variation characteristic by using the target weight value to obtain a fusion variation characteristic;
the calculation formula of the abnormal score of each abnormal video group is as follows:
Figure 318336DEST_PATH_IMAGE003
wherein the content of the first and second substances,Dan anomaly score for characterizing the set of anomalous videos;nthe method is used for representing the number of images included in the abnormal video;φfor characterizing an activation function;za fusion change feature for characterizing the set of outlier videos;w 1b 1parameters to be learned for characterizing the first attention network layer;w 2b 2the parameter to be learned is used for representing the second attention network layer;g ia score for characterizing an ith image in the set of outlier videos;
Figure 770177DEST_PATH_IMAGE004
and the attention weight value is used for representing the ith image in the abnormal video.
Step 206: and judging whether the fusion change characteristics are abnormal or not by using a pre-trained detection model.
Specifically, the obtained fusion change features are input into the detection model obtained in the training in step 201, and the detection result of the video to be detected corresponding to the fusion change features output by the detection model is obtained.
Step 207: after determining that the video to be detected has the abnormal event, determining the spatial information of the abnormal event.
Specifically, after the detection result indicates that an abnormal event exists in the video to be detected, analyzing the time sequence change characteristics to determine the time of the abnormal event;
acquiring a target image to be detected corresponding to the determined time;
determining the target space similarity between the target image to be detected and the adjacent image to be detected;
and determining the spatial information of the abnormal event according to the target spatial similarity.
Fig. 3 shows a video abnormal event detection chip according to an embodiment of the present invention. Taking a software implementation as an example, as shown in fig. 3, as a chip in a logical sense, a CPU of a device in which the chip is located reads corresponding computer program instructions in a nonvolatile memory into a memory to run. The video abnormal event detection chip provided by the embodiment comprises:
the acquisition module 301 is configured to acquire a video to be detected, and obtain at least two images to be detected arranged according to a time sequence by using the video to be detected;
the first feature determining module 302 is configured to determine spatial similarity between adjacent images to be detected, so as to obtain a spatial variation feature of a video to be detected; the space change characteristics comprise a change rule of the space similarity according to a time sequence;
the second characteristic determining module 303 is configured to determine a time sequence similarity between adjacent images to be detected, so as to obtain a time sequence change characteristic of the video to be detected; the time sequence change characteristics comprise a change rule of the time sequence similarity according to a time sequence;
a fusion module 304, configured to fuse the spatial variation feature obtained by the first feature determination module and the time-sequence variation feature obtained by the second feature determination module to obtain a fusion variation feature;
the detection module 305 is configured to determine whether the fusion change feature obtained by the fusion module is abnormal by using a pre-trained detection model, and if so, determine that an abnormal event exists in the video to be detected.
Optionally, on the basis of a video abnormal event detection chip shown in fig. 3, the first feature determining module 302 is further configured to perform the following operations:
performing spatial feature extraction on at least two images to be detected to obtain spatial feature information corresponding to each image to be detected; the image to be detected comprises at least two space points;
for each spatial point of each image to be detected, executing:
determining first spatial feature information of the spatial point, and determining second spatial feature information corresponding to the spatial point in a first image to be detected adjacent to the image to be detected;
determining similarity between the first spatial feature information and the second spatial feature information; the similarity is used for representing the similarity of the spatial point in the image to be detected and the spatial feature information in the first image to be detected;
determining the spatial similarity between the image to be detected and the first image to be detected according to the determined similarity of each spatial point;
arranging the spatial similarity between each image to be detected and a first image to be detected adjacent to the image to be detected according to a time sequence to generate spatial variation characteristics of the video to be detected; wherein the time sequence is the arrangement sequence of at least two images to be detected.
Optionally, on the basis of a video abnormal event detection chip shown in fig. 3, the second characteristic determining module 303 is further configured to perform the following operations:
aiming at each image to be detected, executing the following steps:
carrying out differential processing on the image to be detected and a first image to be detected adjacent to the image to be detected to obtain a differential image; the differential image is used for representing the difference between each pixel point between the image to be detected and the first image to be detected;
screening pixel points in the difference image to determine a motion group;
determining a target differential image corresponding to the motion group as the time sequence similarity between adjacent images to be detected;
arranging the time sequence similarity between each image to be detected and a first image to be detected adjacent to the image to be detected according to a time sequence to generate a time sequence change characteristic of the video to be detected; wherein the time sequence is the arrangement sequence of at least two images to be detected.
Optionally, on the basis of a video abnormal event detection chip shown in fig. 3, the chip further includes: a creation module to perform the following operations:
acquiring at least two groups of historical abnormal-free videos; each group of historical abnormal-free videos comprises at least two images;
performing feature extraction on at least two groups of historical abnormal-free videos to obtain spatial variation features and time sequence variation features corresponding to each group of historical abnormal-free videos;
fusing the spatial variation characteristic and the time sequence variation characteristic of each group of historical abnormal-free videos to obtain a fused variation characteristic corresponding to the group of historical abnormal-free videos; the fusion change characteristics are used for representing the space and time sequence rules of the historical abnormal-free videos arranged according to the time sequence;
clustering the fusion change characteristics of each group of historical abnormal-free videos by using an unsupervised clustering algorithm to obtain at least two clustering clusters;
acquiring at least two groups of abnormal videos and acquiring fusion change characteristics of the at least two groups of abnormal videos;
combining the fusion change characteristics of at least two groups of historical abnormal-free videos and the fusion change characteristics of at least two groups of abnormal videos to obtain a training set;
and training by using the training set according to the at least two clustering clusters and the multi-classification support vector machine to obtain a detection model.
Optionally, on the basis of the video abnormal event detection chip shown in fig. 3, the creating module further includes: an anomaly video construction sub-module for performing the following operations:
aiming at each group of historical abnormal-free videos, randomly disturbing at least two images in the group of historical abnormal-free videos to obtain corresponding abnormal videos; the abnormal video comprises at least two images which are not arranged in time sequence;
extracting the characteristics of at least two groups of abnormal videos to obtain the spatial variation characteristics and the time sequence variation characteristics of each group of historical abnormal-free videos;
fusing the spatial variation characteristics and the time sequence variation characteristics of each group of abnormal videos to obtain fusion variation characteristics corresponding to the group of abnormal videos; the fusion change characteristics are used for representing the space and time sequence rules of at least two images which are not arranged in time sequence.
Optionally, on the basis of the video abnormal event detection chip shown in fig. 3, the fusion change feature of each group of abnormal videos includes a target spatial change feature and a target temporal change feature;
the fusion module 304 is further configured to perform the following operations:
for each set of abnormal videos, performing:
carrying out nonlinear fusion on the target space change characteristic and the target time sequence change characteristic, determining a first weight value, and outputting the score of each image in the abnormal video group through an activation function;
carrying out weight distribution on each image in the abnormal videos by using a softmax function to obtain abnormal scores of the abnormal videos;
judging whether the abnormal scores corresponding to each group of abnormal videos are all larger than a preset threshold value;
if so, determining that the first weight value is a target weight value, and fusing the space variation characteristic and the time sequence variation characteristic by using the target weight value to obtain a fusion variation characteristic.
Optionally, on the basis of the video abnormal event detection chip shown in fig. 3, the fusion module 304 is further configured to perform the following operations:
the calculation formula of the abnormal score of each abnormal video group is as follows:
Figure 54135DEST_PATH_IMAGE003
wherein the content of the first and second substances,Dan anomaly score for characterizing the set of anomalous videos;nthe method is used for representing the number of images included in the abnormal video;φfor characterizing an activation function;za fusion change feature for characterizing the set of outlier videos;w 1b 1parameters to be learned for characterizing the first attention network layer;w 2b 2the parameter to be learned is used for representing the second attention network layer;g ia score for characterizing an ith image in the set of outlier videos;
Figure 337348DEST_PATH_IMAGE004
and the attention weight value is used for representing the ith image in the abnormal video.
Optionally, on the basis of a video abnormal event detection chip shown in fig. 3, the chip further includes: a positioning module to perform the following operations:
analyzing the time sequence change characteristics to determine the time of the abnormal event;
acquiring a target image to be detected corresponding to the determined time;
determining the target space similarity between the target image to be detected and the adjacent image to be detected;
and determining the spatial information of the abnormal event according to the target spatial similarity.
It is to be understood that the illustrated structure of the embodiment of the present invention does not constitute a specific limitation to a video abnormal event detection chip. In other embodiments of the present invention, a video anomaly detection chip may include more or fewer components than shown, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Since the contents of information interaction, execution process, and the like between the modules in the chip are based on the same concept as the method embodiment of the present invention, specific contents may refer to the description in the method embodiment of the present invention, and are not described herein again.
The embodiment of the invention also provides a video abnormal event detection device, which comprises: at least one memory area and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor is configured to invoke the machine readable program to perform a video exceptional event detection method according to any one of the embodiments of the present invention.
In the embodiment of the present invention, as shown in fig. 4, a hardware structure diagram of a device in which a video abnormal event detection apparatus provided in the embodiment of the present invention is located is shown, except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 4, the device in which the apparatus is located in the embodiment may also generally include other hardware, such as a forwarding chip responsible for processing a packet, and the like.
An embodiment of the present invention further provides a computer-readable medium, where a computer instruction is stored on the computer-readable medium, and when the computer instruction is executed by a processor, the processor is caused to execute a video abnormal event detection method according to any embodiment of the present invention.
Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.
In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.
Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.
Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.
Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion module connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion module to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a" does not exclude the presence of other similar elements in a process, method, article, or apparatus that comprises the element.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. A video abnormal event detection method is characterized by comprising the following steps:
acquiring a video to be detected;
obtaining at least two images to be detected which are arranged according to a time sequence by utilizing the video to be detected;
determining the spatial similarity between adjacent images to be detected to obtain the spatial variation characteristics of the video to be detected; wherein the spatial variation characteristics comprise a variation rule of the spatial similarity according to a time sequence;
determining the time sequence similarity between adjacent images to be detected to obtain the time sequence change characteristics of the video to be detected; the time sequence change characteristics comprise a change rule of the time sequence similarity according to a time sequence;
fusing the spatial variation characteristic and the time sequence variation characteristic to obtain a fused variation characteristic;
judging whether the fusion change characteristics are abnormal or not by using a pre-trained detection model;
if so, determining that an abnormal event exists in the video to be detected;
the method for creating the detection model comprises the following steps:
acquiring at least two groups of historical abnormal-free videos; each group of historical abnormal-free videos comprises at least two images which are arranged according to a time sequence;
acquiring at least two groups of abnormal videos and acquiring fusion change characteristics of the at least two groups of abnormal videos;
extracting the characteristics of the at least two groups of historical abnormal-free videos to obtain fusion change characteristics corresponding to each group of historical abnormal-free videos; wherein the fusion change features are used for representing the spatial and temporal regularity of the historical abnormal-free video set arranged in time sequence;
clustering the fusion change characteristics of each group of historical abnormal-free videos by using an unsupervised clustering algorithm to obtain at least two clustering clusters;
combining the fusion change characteristics of the at least two groups of historical abnormal videos and the fusion change characteristics of the at least two groups of abnormal videos to obtain a training set;
and training by using the training set according to the at least two clustering clusters and the multi-classification support vector machine to obtain the detection model.
2. The method according to claim 1, wherein the determining the spatial similarity between the adjacent images to be detected to obtain the spatial variation characteristic of the video to be detected comprises:
performing spatial feature extraction on the at least two images to be detected to obtain spatial feature information corresponding to each image to be detected; the image to be detected comprises at least two space points;
for each spatial point of each image to be detected, executing:
determining first spatial feature information of the spatial point, and determining second spatial feature information corresponding to the spatial point in a first image to be detected adjacent to the image to be detected;
determining a similarity between the first spatial feature information and the second spatial feature information; the similarity is used for representing the similarity of the spatial point in the image to be detected and the spatial feature information in the first image to be detected;
determining the spatial similarity between the image to be detected and the first image to be detected according to the determined similarity of each spatial point;
arranging the spatial similarity between each image to be detected and a first image to be detected adjacent to the image to be detected according to a time sequence to generate spatial variation characteristics of the video to be detected; wherein, the time sequence is the arrangement sequence of the at least two images to be detected.
3. The method according to claim 1, wherein the determining the time sequence similarity between the adjacent images to be detected to obtain the time sequence variation characteristics of the video to be detected comprises:
aiming at each image to be detected, executing the following steps:
carrying out differential processing on the image to be detected and a first image to be detected adjacent to the image to be detected to obtain a differential image; the differential image is used for representing the difference between each pixel point between the image to be detected and the first image to be detected;
screening pixel points in the differential image to determine a motion group;
determining a target differential image corresponding to the motion group as the time sequence similarity between the adjacent images to be detected;
arranging the time sequence similarity between each image to be detected and a first image to be detected adjacent to the image to be detected according to a time sequence to generate a time sequence change characteristic of the video to be detected; wherein, the time sequence is the arrangement sequence of the at least two images to be detected.
4. The method according to claim 1, wherein the obtaining at least two groups of abnormal videos and obtaining the fusion change characteristics of the at least two groups of abnormal videos comprises:
aiming at each group of historical abnormal-free videos, randomly disturbing at least two images in the group of historical abnormal-free videos to obtain corresponding abnormal videos; the abnormal video comprises at least two images which are not arranged in time sequence;
extracting the characteristics of at least two groups of abnormal videos to obtain the spatial variation characteristics and the time sequence variation characteristics of each group of historical abnormal-free videos;
fusing the spatial variation characteristics and the time sequence variation characteristics of each group of abnormal videos to obtain fusion variation characteristics corresponding to the group of abnormal videos; the fusion change characteristics are used for representing the space and time sequence rules of at least two images which are not arranged in time sequence.
5. The method according to claim 1, wherein the fusion variation features of each abnormal video set include a target spatial variation feature and a target temporal variation feature;
the fusing the spatial variation characteristic and the time sequence variation characteristic to obtain a fused variation characteristic, including:
for each set of abnormal videos, performing:
carrying out nonlinear fusion on the target space variation characteristic and the target time sequence variation characteristic, determining a first weight value, and outputting the score of each image in the abnormal video group through an activation function;
carrying out weight distribution on each image in the abnormal videos by using a softmax function to obtain abnormal scores of the abnormal videos;
judging whether the abnormal scores corresponding to each group of abnormal videos are all larger than a preset threshold value;
if so, determining that the first weight value is a target weight value, and fusing the spatial variation characteristic and the time sequence variation characteristic by using the target weight value to obtain a fusion variation characteristic.
6. The method of claim 5,
the calculation formula of the abnormal score of each abnormal video group is as follows:
Figure DEST_PATH_IMAGE001
wherein the content of the first and second substances,Dan anomaly score for characterizing the set of anomalous videos;nthe method is used for representing the number of images included in the abnormal video;φfor characterizing an activation function;za fusion change feature for characterizing the set of outlier videos;w 1b 1parameters to be learned for characterizing the first attention network layer;w 2b 2the parameter to be learned is used for representing the second attention network layer;g ia score for characterizing an ith image in the set of outlier videos;
Figure 44140DEST_PATH_IMAGE002
and the attention weight value is used for representing the ith image in the abnormal video.
7. The method according to any one of claims 1 to 6, further comprising, after the determining that there is an abnormal event in the video to be detected:
analyzing the time sequence change characteristics to determine the time of the abnormal event;
acquiring a target image to be detected corresponding to the determined time;
determining the target space similarity between the target image to be detected and the adjacent image to be detected;
and determining the spatial information of the abnormal event according to the target spatial similarity.
8. A video exceptional event detecting apparatus, comprising:
the acquisition module is used for acquiring a video to be detected and acquiring at least two images to be detected which are arranged according to a time sequence by using the video to be detected;
the first characteristic determining module is used for determining the spatial similarity between adjacent images to be detected to obtain the spatial variation characteristics of the video to be detected; wherein the spatial variation characteristics comprise a variation rule of the spatial similarity according to a time sequence;
the second characteristic determining module is used for determining the time sequence similarity between adjacent images to be detected to obtain the time sequence change characteristics of the video to be detected; the time sequence change characteristics comprise a change rule of the time sequence similarity according to a time sequence;
the fusion module is used for fusing the spatial variation characteristic obtained by the first characteristic determining module and the time sequence variation characteristic obtained by the second characteristic determining module to obtain a fusion variation characteristic;
the detection module is used for judging whether the fusion change characteristics obtained by the fusion module are abnormal or not by utilizing a pre-trained detection model, and if so, determining that an abnormal event exists in the video to be detected;
a creation module to perform the following operations:
acquiring at least two groups of historical abnormal-free videos; each group of historical abnormal-free videos comprises at least two images;
performing feature extraction on at least two groups of historical abnormal-free videos to obtain spatial variation features and time sequence variation features corresponding to each group of historical abnormal-free videos;
fusing the spatial variation characteristic and the time sequence variation characteristic of each group of historical abnormal-free videos to obtain a fused variation characteristic corresponding to the group of historical abnormal-free videos; the fusion change characteristics are used for representing the space and time sequence rules of the historical abnormal-free videos arranged according to the time sequence;
clustering the fusion change characteristics of each group of historical abnormal-free videos by using an unsupervised clustering algorithm to obtain at least two clustering clusters;
acquiring at least two groups of abnormal videos and acquiring fusion change characteristics of the at least two groups of abnormal videos;
combining the fusion change characteristics of at least two groups of historical abnormal-free videos and the fusion change characteristics of at least two groups of abnormal videos to obtain a training set;
and training by using the training set according to the at least two clustering clusters and the multi-classification support vector machine to obtain a detection model.
9. A video exceptional event detecting apparatus, comprising: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor, configured to invoke the machine readable program to perform the method of any of claims 1 to 7.
CN202110568256.2A 2021-05-25 2021-05-25 Video abnormal event detection method and chip Active CN113255518B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110568256.2A CN113255518B (en) 2021-05-25 2021-05-25 Video abnormal event detection method and chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110568256.2A CN113255518B (en) 2021-05-25 2021-05-25 Video abnormal event detection method and chip

Publications (2)

Publication Number Publication Date
CN113255518A CN113255518A (en) 2021-08-13
CN113255518B true CN113255518B (en) 2021-09-24

Family

ID=77184262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110568256.2A Active CN113255518B (en) 2021-05-25 2021-05-25 Video abnormal event detection method and chip

Country Status (1)

Country Link
CN (1) CN113255518B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113742510A (en) * 2021-08-26 2021-12-03 浙江大华技术股份有限公司 Determination method and device for cluster center of gathering files, computer equipment and storage medium
CN113435432B (en) * 2021-08-27 2021-11-30 腾讯科技(深圳)有限公司 Video anomaly detection model training method, video anomaly detection method and device
CN113705490B (en) * 2021-08-31 2023-09-12 重庆大学 Anomaly detection method based on reconstruction and prediction

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8660368B2 (en) * 2011-03-16 2014-02-25 International Business Machines Corporation Anomalous pattern discovery
CN106709447A (en) * 2016-12-21 2017-05-24 华南理工大学 Abnormal behavior detection method in video based on target positioning and characteristic fusion
CN108133172B (en) * 2017-11-16 2022-04-05 北京华道兴科技有限公司 Method for classifying moving objects in video and method and device for analyzing traffic flow
CN109902612B (en) * 2019-02-22 2021-01-08 北京工业大学 Monitoring video abnormity detection method based on unsupervised learning
CN112668366B (en) * 2019-10-15 2024-04-26 华为云计算技术有限公司 Image recognition method, device, computer readable storage medium and chip
CN111723694A (en) * 2020-06-05 2020-09-29 广东海洋大学 Abnormal driving behavior identification method based on CNN-LSTM space-time feature fusion

Also Published As

Publication number Publication date
CN113255518A (en) 2021-08-13

Similar Documents

Publication Publication Date Title
CN113255518B (en) Video abnormal event detection method and chip
US8660368B2 (en) Anomalous pattern discovery
JP6694829B2 (en) Rule-based video importance analysis
US8548198B2 (en) Identifying anomalous object types during classification
US8619135B2 (en) Detection of abnormal behaviour in video objects
BR102016007265A2 (en) MULTIMODAL AND REAL-TIME METHOD FOR SENSITIVE CONTENT FILTERING
KR101731461B1 (en) Apparatus and method for behavior detection of object
JP6299299B2 (en) Event detection apparatus and event detection method
CN103312770B (en) Method for auditing resources of cloud platform
US20120275649A1 (en) Foreground object tracking
JP2016072964A (en) System and method for subject re-identification
JP5388829B2 (en) Intruder detection device
JP4940220B2 (en) Abnormal operation detection device and program
KR101720781B1 (en) Apparatus and method for prediction of abnormal behavior of object
KR20170082025A (en) Apparatus and Method for Identifying Video with Copyright using Recognizing Face based on Machine Learning
Turchini et al. Convex polytope ensembles for spatio-temporal anomaly detection
CN110060278A (en) The detection method and device of moving target based on background subtraction
CN112597928A (en) Event detection method and related device
Duque et al. The OBSERVER: An intelligent and automated video surveillance system
Shuoyan et al. Abnormal behavior detection based on the motion-changed rules
Vashistha et al. A comparative analysis of different violence detection algorithms from videos
Wang et al. Anomaly detection in crowd scene using historical information
CN114005060A (en) Image data determining method and device
KR20130067758A (en) Apparatus and method for detecting human by using svm learning
CN115187884A (en) High-altitude parabolic identification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20221209

Address after: 807-3, floor 8, block F, No. 9, Shangdi Third Street, Haidian District, Beijing 100080

Patentee after: Zhongcheng Hualong Computer Technology Co.,Ltd.

Address before: No.114, 14th floor, block B, building 1, No.38, Zhongguancun Street, Haidian District, Beijing 100082

Patentee before: Shenwei Super Computing (Beijing) Technology Co.,Ltd.