CN113255518B

CN113255518B - Video abnormal event detection method and chip

Info

Publication number: CN113255518B
Application number: CN202110568256.2A
Authority: CN
Inventors: 王嘉诚; 张少仲
Original assignee: Shenwei Super Computing Beijing Technology Co ltd
Current assignee: Zhongcheng Hualong Computer Technology Co Ltd
Priority date: 2021-05-25
Filing date: 2021-05-25
Publication date: 2021-09-24
Anticipated expiration: 2041-05-25
Also published as: CN113255518A

Abstract

The invention relates to a video abnormal event detection method and a chip, wherein the method comprises the following steps: acquiring a video to be detected; obtaining at least two images to be detected which are arranged according to a time sequence by using a video to be detected; determining the spatial similarity between adjacent images to be detected to obtain the spatial variation characteristics of the video to be detected; the space change characteristics comprise a change rule of the space similarity according to a time sequence; determining the time sequence similarity between adjacent images to be detected to obtain the time sequence change characteristics of the video to be detected; the time sequence change characteristics comprise a change rule of the time sequence similarity according to a time sequence; fusing the spatial variation characteristic and the time sequence variation characteristic to obtain a fused variation characteristic; judging whether the fusion change characteristics are abnormal or not by using a pre-trained detection model; if yes, determining that an abnormal event exists in the video to be detected. The scheme can improve the identification efficiency of video abnormal events.

Description

Video abnormal event detection method and chip

Technical Field

The invention relates to the technical field of image processing, in particular to a video abnormal event method and a chip.

Background

With the continuous development of the internet of things technology, the monitoring equipment is widely deployed in all public areas, and hidden safety guarantee is provided for people. However, the conventional video monitoring mainly relies on artificial monitoring of abnormal events in scenes, which not only requires extremely high labor cost, but also is easily affected by subjective factors, and even causes some abnormal events to be not found in time, so that the identification efficiency of the abnormal events in the existing monitoring video is low.

In view of the above, it is desirable to provide a video abnormal event detection method and chip to solve the above-mentioned deficiencies.

Disclosure of Invention

The invention provides a video abnormal event detection method and a chip aiming at the defects in the prior art and solving the technical problem of how to improve the identification efficiency of video abnormal events.

In order to solve the above technical problem, in a first aspect, the present invention provides a method for detecting a video abnormal event, including:

acquiring a video to be detected;

obtaining at least two images to be detected which are arranged according to a time sequence by utilizing the video to be detected;

determining the spatial similarity between adjacent images to be detected to obtain the spatial variation characteristics of the video to be detected; wherein the spatial variation characteristics comprise a variation rule of the spatial similarity according to a time sequence;

determining the time sequence similarity between adjacent images to be detected to obtain the time sequence change characteristics of the video to be detected; the time sequence change characteristics comprise a change rule of the time sequence similarity according to a time sequence;

fusing the spatial variation characteristic and the time sequence variation characteristic to obtain a fused variation characteristic;

judging whether the fusion change characteristics are abnormal or not by using a pre-trained detection model;

and if so, determining that an abnormal event exists in the video to be detected.

Optionally, the determining the spatial similarity between adjacent images to be detected to obtain the spatial variation characteristic of the video to be detected includes:

performing spatial feature extraction on the at least two images to be detected to obtain spatial feature information corresponding to each image to be detected; the image to be detected comprises at least two space points;

for each spatial point of each image to be detected, executing:

determining first spatial feature information of the spatial point, and determining second spatial feature information corresponding to the spatial point in a first image to be detected adjacent to the image to be detected;

determining a similarity between the first spatial feature information and the second spatial feature information; the similarity is used for representing the similarity of the spatial point in the image to be detected and the spatial feature information in the first image to be detected;

determining the spatial similarity between the image to be detected and the first image to be detected according to the determined similarity of each spatial point;

arranging the spatial similarity between each image to be detected and a first image to be detected adjacent to the image to be detected according to a time sequence to generate spatial variation characteristics of the video to be detected; wherein, the time sequence is the arrangement sequence of the at least two images to be detected.

Optionally, the determining the time sequence similarity between adjacent images to be detected to obtain the time sequence change characteristics of the video to be detected includes:

aiming at each image to be detected, executing the following steps:

carrying out differential processing on the image to be detected and a first image to be detected adjacent to the image to be detected to obtain a differential image; the differential image is used for representing the difference between each pixel point between the image to be detected and the first image to be detected;

screening pixel points in the differential image to determine a motion group;

determining a target differential image corresponding to the motion group as the time sequence similarity between the adjacent images to be detected;

arranging the time sequence similarity between each image to be detected and a first image to be detected adjacent to the image to be detected according to a time sequence to generate a time sequence change characteristic of the video to be detected; wherein, the time sequence is the arrangement sequence of the at least two images to be detected.

Optionally, the method for creating the detection model includes:

acquiring at least two groups of historical abnormal-free videos; each group of historical abnormal-free videos comprises at least two images which are arranged according to a time sequence;

acquiring at least two groups of abnormal videos and acquiring fusion change characteristics of the at least two groups of abnormal videos;

extracting the characteristics of the at least two groups of historical abnormal-free videos to obtain fusion change characteristics corresponding to each group of historical abnormal-free videos; wherein the fusion change features are used for representing the spatial and temporal regularity of the historical abnormal-free video set arranged in time sequence;

clustering the fusion change characteristics of each group of historical abnormal-free videos by using an unsupervised clustering algorithm to obtain at least two clustering clusters;

combining the fusion change characteristics of the at least two groups of historical abnormal videos and the fusion change characteristics of the at least two groups of abnormal videos to obtain a training set;

and training by using the training set according to the at least two clustering clusters and the multi-classification support vector machine to obtain the detection model.

Optionally, the obtaining at least two groups of abnormal videos and obtaining the fusion change characteristics of the at least two groups of abnormal videos includes:

aiming at each group of historical abnormal-free videos, randomly disturbing at least two images in the group of historical abnormal-free videos to obtain corresponding abnormal videos; the abnormal video comprises at least two images which are not arranged in time sequence;

extracting the characteristics of at least two groups of abnormal videos to obtain the spatial variation characteristics and the time sequence variation characteristics of each group of historical abnormal-free videos;

fusing the spatial variation characteristics and the time sequence variation characteristics of each group of abnormal videos to obtain fusion variation characteristics corresponding to the group of abnormal videos; the fusion change characteristics are used for representing the space and time sequence rules of at least two images which are not arranged in time sequence.

Optionally, the fusion change feature of each group of abnormal videos includes a target spatial change feature and a target time sequence change feature;

the fusing the spatial variation characteristic and the time sequence variation characteristic to obtain a fused variation characteristic, including:

for each set of abnormal videos, performing:

carrying out nonlinear fusion on the target space variation characteristic and the target time sequence variation characteristic, determining a first weight value, and outputting the score of each image in the abnormal video group through an activation function;

carrying out weight distribution on each image in the abnormal videos by using a softmax function to obtain abnormal scores of the abnormal videos;

judging whether the abnormal scores corresponding to each group of abnormal videos are all larger than a preset threshold value;

if so, determining that the first weight value is a target weight value, and fusing the spatial variation characteristic and the time sequence variation characteristic by using the target weight value to obtain a fusion variation characteristic.

Optionally, the calculation formula of the anomaly score of each abnormal video group is as follows:

wherein the content of the first and second substances,Dan anomaly score for characterizing the set of anomalous videos;nthe method is used for representing the number of images included in the abnormal video;φfor characterizing an activation function;za fusion change feature for characterizing the set of outlier videos;w ¹、b ¹parameters to be learned for characterizing the first attention network layer;w ²、b ²the parameter to be learned is used for representing the second attention network layer;g _ia score for characterizing an ith image in the set of outlier videos;

and the attention weight value is used for representing the ith image in the abnormal video.

Optionally, after determining that there is an abnormal event in the video to be detected, the method further includes:

analyzing the time sequence change characteristics to determine the time of the abnormal event;

acquiring a target image to be detected corresponding to the determined time;

determining the target space similarity between the target image to be detected and the adjacent image to be detected;

and determining the spatial information of the abnormal event according to the target spatial similarity.

In a second aspect, the present invention further provides a video abnormal event detection chip, including:

the acquisition module is used for acquiring a video to be detected and acquiring at least two images to be detected which are arranged according to a time sequence by using the video to be detected;

the first characteristic determining module is used for determining the spatial similarity between adjacent images to be detected to obtain the spatial variation characteristics of the video to be detected; wherein the spatial variation characteristics comprise a variation rule of the spatial similarity according to a time sequence;

the second characteristic determining module is used for determining the time sequence similarity between adjacent images to be detected to obtain the time sequence change characteristics of the video to be detected; the time sequence change characteristics comprise a change rule of the time sequence similarity according to a time sequence;

the fusion module is used for fusing the spatial variation characteristic obtained by the first characteristic determining module and the time sequence variation characteristic obtained by the second characteristic determining module to obtain a fusion variation characteristic;

and the detection module is used for judging whether the fusion change characteristics obtained by the fusion module are abnormal or not by utilizing a pre-trained detection model, and if so, determining that an abnormal event exists in the video to be detected.

In a third aspect, the present invention further provides a video abnormal event detection apparatus, including: at least one memory and at least one processor;

the at least one memory to store a machine readable program;

the at least one processor is configured to invoke the machine readable program to execute the video abnormal event detection method provided by the first aspect or any possible implementation manner of the first aspect.

The embodiment of the invention provides a method and a chip for detecting video abnormal events, wherein the method comprises the following steps: the method comprises the steps of obtaining at least two images to be detected arranged according to a time sequence by utilizing an obtained video to be detected, determining the spatial similarity and the time sequence similarity between adjacent images to be detected so as to respectively obtain the spatial variation characteristic and the time sequence variation characteristic of the video to be detected, fusing the obtained spatial variation characteristic and the time sequence variation characteristic, and determining that an abnormal event exists in the video to be detected when the fused variation characteristic is judged to be abnormal by utilizing a pre-trained detection model. Therefore, the video to be detected is processed, the spatial change characteristic and the time sequence change characteristic of the video to be detected are fused, the internal relation between time and space is utilized, accurate detection and identification of the segment abnormal events are facilitated, more accurate detection and positioning of the boundary of the abnormal events can be realized, manual monitoring is not needed, and the video abnormal event identification efficiency is further improved.

Drawings

Fig. 1 is a method for detecting video abnormal events according to an embodiment of the present invention;

FIG. 2 is a block diagram of another video exception detection method according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a video abnormal event detection chip according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a device where a video abnormal event detection apparatus according to an embodiment of the present invention is located.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

As described in the background art, in the prior art, conventional video monitoring mainly relies on manual monitoring and discrimination, and cannot cope with the current trend of data volume explosion. The existing abnormal event detection method based on weak supervised learning only considers the characteristics of the video characteristic segment and does not connect the video characteristic segment with other video characteristic segments, so that the detection significance is poor and the identification efficiency is low.

In view of this, the relation among the segments of the video can be considered to better represent the dynamic characteristics of the video, and the spatiotemporal association relationship in the video is fused to realize the fast and effective detection and positioning of the abnormal event.

The foregoing is the concept provided by the present invention, and specific implementations of the concept provided by the present invention are described below.

As shown in fig. 1, a method for detecting a video abnormal event according to an embodiment of the present invention includes the following steps:

step 101: acquiring a video to be detected;

step 102: obtaining at least two images to be detected which are arranged according to a time sequence by using a video to be detected;

step 103: determining the spatial similarity between adjacent images to be detected to obtain the spatial variation characteristics of the video to be detected; the space change characteristics comprise a change rule of the space similarity according to a time sequence;

step 104: determining the time sequence similarity between adjacent images to be detected to obtain the time sequence change characteristics of the video to be detected; the time sequence change characteristics comprise a change rule of the time sequence similarity according to a time sequence;

step 105: fusing the spatial variation characteristic and the time sequence variation characteristic to obtain a fused variation characteristic;

step 106: judging whether the fusion change characteristics are abnormal or not by using a pre-trained detection model;

step 107: if yes, determining that an abnormal event exists in the video to be detected.

In the embodiment of the invention, at least two images to be detected which are arranged according to a time sequence are obtained by using the obtained video to be detected, the spatial similarity and the time sequence similarity between adjacent images to be detected are determined so as to respectively obtain the spatial variation characteristic and the time sequence variation characteristic of the video to be detected, the obtained spatial variation characteristic and the time sequence variation characteristic are fused, and when the fused variation characteristic is judged to be abnormal by using a pre-trained detection model, the abnormal event in the video to be detected is determined. Therefore, the video to be detected is processed, the spatial change characteristic and the time sequence change characteristic of the video to be detected are fused, the internal relation between time and space is utilized, accurate detection and identification of the segment abnormal events are facilitated, more accurate detection and positioning of the boundary of the abnormal events can be realized, manual monitoring is not needed, and the video abnormal event identification efficiency is further improved.

It should be noted that, in step 101, the captured video of a certain area may be acquired in real time by various video monitoring devices. In step 102, a video to be detected is subjected to framing processing to obtain at least two images to be detected arranged in a time sequence.

In this embodiment of the present invention, the detection model in step 106 may be obtained by training a training set, where the training set includes at least two sets of fusion change features of videos, and each set of videos includes a fusion change feature as an input and an abnormal event detection result of the set of videos as an output.

Optionally, in the method for detecting a video abnormal event shown in fig. 1, the step 103 of determining spatial similarity between adjacent images to be detected to obtain a spatial variation characteristic of the video to be detected includes:

performing spatial feature extraction on at least two images to be detected to obtain spatial feature information corresponding to each image to be detected; the image to be detected comprises at least two space points;

for each spatial point of each image to be detected, executing:

determining similarity between the first spatial feature information and the second spatial feature information; the similarity is used for representing the similarity of the spatial point in the image to be detected and the spatial feature information in the first image to be detected;

arranging the spatial similarity between each image to be detected and a first image to be detected adjacent to the image to be detected according to a time sequence to generate spatial variation characteristics of the video to be detected; wherein the time sequence is the arrangement sequence of at least two images to be detected.

It should be noted that the spatial point may be a spatial position coordinate in the image to be detected, and the first spatial feature information of each spatial point constitutes spatial feature information of the image to be detected; wherein, the first image to be detected adjacent to the image to be detected is the next image to be detected in the image to be detected which is arranged according to the time sequence.

In the embodiment of the invention, for each spatial point of each image to be detected, first spatial feature information of the spatial point is firstly determined, second spatial feature information of the spatial point in a first image to be detected adjacent to the detected image is then determined, and the similarity of the same spatial point in the adjacent images to be detected is determined by comparing the first spatial feature information with the second spatial feature information. Similarly, the spatial similarity between each spatial point in the image to be detected and the adjacent first image to be detected can be obtained, and finally the spatial similarity of each image to be detected is arranged according to the arrangement sequence of at least two images to be detected, so as to generate the spatial variation characteristic of the video to be detected.

Because videos have spatial similarity, in the embodiment of the present invention, a spatial similarity change rule corresponding to a video to be detected, that is, a change rule of spatial feature information in the video to be detected, is obtained by comparing spatial feature information of adjacent images to be detected. Therefore, the information of the spatial dimension is added in the video abnormal event detection for detection, the accuracy of abnormal event detection is increased, the generalization capability of abnormal event detection in a complex environment is improved, meanwhile, the spatial information of the subsequent abnormal event can be determined based on the spatial characteristic information, and more accurate abnormal event boundary detection and positioning are realized.

Optionally, in the method for detecting a video abnormal event shown in fig. 1, the step 104 of determining a time sequence similarity between adjacent images to be detected to obtain a time sequence variation characteristic of the video to be detected includes:

aiming at each image to be detected, executing the following steps:

screening pixel points in the difference image to determine a motion group;

determining a target differential image corresponding to the motion group as the time sequence similarity between adjacent images to be detected;

arranging the time sequence similarity between each image to be detected and a first image to be detected adjacent to the image to be detected according to a time sequence to generate a time sequence change characteristic of the video to be detected; wherein the time sequence is the arrangement sequence of at least two images to be detected.

In the embodiment of the invention, for each image to be detected, firstly, the image to be detected and a first image to be detected adjacent to the image to be detected are subjected to differential processing to obtain a differential image, then, pixel points in the differential image are screened, (namely, the difference between corresponding pixel points in a current image and a background image is detected, if the difference is greater than a certain threshold value, the pixel is judged to be a foreground moving object), a moving group is determined, the differential image corresponding to the moving group is determined to be a target differential image (namely, the difference between the corresponding pixel points in the differential image corresponding to the reserved moving group is kept, and the difference information of the pixel points of other non-moving groups is returned to zero), and the target differential image is used as the time sequence similarity between the adjacent images to be detected. Therefore, the video abnormal event is mainly used for detecting the motion group, and the time sequence similarity only retains the time sequence similarity information of the corresponding motion group, so that the video abnormal event to be detected can be quickly acquired and detected.

The occurrence of abnormal events in the video has a relationship in time sequence relation besides the similarity on the spatial feature information, and the time sequence relation is a more important video feature. Often, an abnormal event does not have the smooth timing characteristics of a normal event. In the embodiment of the invention, the time sequence similarity of each image to be detected is finally arranged according to the arrangement sequence of at least two images to be detected, and the time sequence change characteristic of the video to be detected is generated. Therefore, based on the time sequence change characteristics of the motion group in the video to be detected, the abnormal event in the video to be detected can be preliminarily determined without relying on manual monitoring.

Then, in order to further improve accurate detection and identification of the abnormal events in the video, the abnormal events in the video are completely expressed, and detection robustness is improved. In step 105, the spatial variation characteristic obtained in step 103 and the temporal variation characteristic obtained in step 104 are fused.

Optionally, in a video abnormal event detection method shown in fig. 1, the method for creating a detection model includes:

extracting the characteristics of at least two groups of historical abnormal-free videos to obtain fusion change characteristics corresponding to each group of historical abnormal-free videos; the fusion change characteristics are used for representing the space and time sequence rules of the historical abnormal-free videos arranged according to the time sequence;

combining the fusion change characteristics of at least two groups of historical abnormal-free videos and the fusion change characteristics of at least two groups of abnormal videos to obtain a training set;

and training by using the training set according to the at least two clustering clusters and the multi-classification support vector machine to obtain a detection model.

In this embodiment of the present invention, the method for creating the detection model in step 106 includes: acquiring at least two groups of historical abnormal-free videos, wherein each group of historical abnormal-free videos comprises at least two images arranged according to a time sequence, acquiring spatial variation characteristics and time sequence variation characteristics of each group of historical abnormal-free videos according to the method in the

steps

103 and 104, fusing the spatial variation characteristics and the time sequence variation characteristics corresponding to each group of historical abnormal-free videos in the step 105, clustering the obtained fusion variation characteristics by using an unsupervised clustering algorithm (such as a K-means clustering algorithm) to obtain at least two clustering clusters representing normal motion behaviors, acquiring at least two groups of abnormal videos, combining the fusion variation characteristics of at least two groups of historical abnormal-free videos and the fusion variation characteristics of at least two groups of abnormal videos to obtain a training set, training at least two multi-classification support vector machines by using the training set according to the at least two clustering clusters, obtaining a detection model comprising at least two multi-classification support vector machines; the number of the multi-classification support vector machines and the number of the clustering clusters are the same.

In the embodiment of the present invention, step 106 uses the pre-trained detection model including at least two multi-class support vector machines to input the fusion change characteristics of the video to be detected obtained in step 105 into the detection model, so as to obtain the detection result of the video to be detected output by the detection model. When at least one multi-classification support vector machine in the detection model judges that the fusion change characteristics are abnormal, the abnormal event exists in the video to be detected. Therefore, the detection model can convert the abnormal detection problem into the classification problem, and when the abnormal video does not belong to any normal video, the abnormal event can be detected more quickly and effectively, so that the detection and identification efficiency of the video abnormal event is improved.

It should be noted that, in the embodiment of the present invention, the number of cluster clusters may be preset according to environments where different types of videos are located.

Optionally, in the method for detecting a video abnormal event shown in fig. 1, acquiring at least two groups of abnormal videos and obtaining a fusion change feature of the at least two groups of abnormal videos includes:

Abnormal events are usually not defined clearly, so that certain difficulty is brought to abnormal event detection, abnormal event samples are not easy to obtain, and usually, only abnormal videos can be collected manually to obtain enough abnormal samples for training a detection model, so that the cost is too high. However, the abnormal events do not have the stable time sequence characteristic of the normal events, so that the abnormal videos are obtained by breaking the stable time sequence characteristic, and the cost of collecting abnormal data is reduced. For example, when the sports group a walks normally for 6min and is suddenly knocked down by surrounding people, that is, the action of a changes suddenly, at this time, the time sequence relationship of the collocation rule cannot be captured, and the time sequence relationship of the sports has certain confusion, that is, it is determined that an abnormal event is detected.

Specifically, the embodiment of the present invention provides a method for acquiring an abnormal video, which includes, first, for each group of historical abnormal-free videos, randomly disturbing at least two images arranged in time sequence in the group of historical abnormal-free videos, that is, disturbing timing information of normal samples, and destroying spatial similarity and temporal similarity between adjacent images, so as to obtain an abnormal video corresponding to the group of historical abnormal-free videos. Therefore, as for the historical abnormal-free videos, abnormal video samples corresponding to the clustering clusters of different normal motions can be obtained by a random disturbance method, and balanced training samples are provided for each multi-classification support vector machine, so that the cost for collecting the abnormal videos is further reduced.

In the embodiment of the present invention, the spatial variation characteristic and the temporal variation characteristic of each group of abnormal videos are obtained according to the methods described in step 103 and step 104, and the spatial variation characteristic and the temporal variation characteristic corresponding to each group of abnormal videos are fused in step 105 to obtain a fused variation characteristic corresponding to each group of abnormal videos.

Optionally, in the video abnormal event detection method shown in fig. 1, the fusion change feature of each group of abnormal videos includes a target spatial change feature and a target temporal change feature;

step 105, fusing the spatial variation characteristic and the time sequence variation characteristic to obtain a fusion variation characteristic, which comprises the following steps:

for each set of abnormal videos, performing:

carrying out nonlinear fusion on the target space change characteristic and the target time sequence change characteristic, determining a first weight value, and outputting the score of each image in the abnormal video group through an activation function;

if so, determining that the first weight value is a target weight value, and fusing the space variation characteristic and the time sequence variation characteristic by using the target weight value to obtain a fusion variation characteristic.

In order to better utilize the spatial similarity and the time sequence similarity, in the embodiment of the invention, a self-adaptive space-time fusion graph network is constructed to learn the space-time characteristics of the abnormal events, and the space change characteristics and the time sequence change characteristics are fused by adopting a self-adaptive weight fusion method to obtain the fusion change characteristics.

Specifically, the weight value of each feature during fusion is determined by using the abnormal video. Firstly, aiming at each group of abnormal videos, carrying out nonlinear fusion on target space change characteristics and target time sequence change characteristics included in fusion change characteristics of the group of abnormal videos, determining a first weight value, outputting a score of each image in the group of abnormal videos through an activation function, carrying out weight distribution on each image in the group of abnormal videos based on an attention mechanism, finally obtaining abnormal scores corresponding to the group of abnormal videos, similarly, obtaining abnormal scores corresponding to the groups of abnormal videos, and when the abnormal scores corresponding to the groups of abnormal videos are all larger than a preset threshold value, determining the first weight value as the target weight value; the target weight value comprises respective weights of the spatial variation characteristic and the time sequence variation characteristic in fusion. Therefore, the weight values of the spatial variation characteristic and the time sequence variation characteristic suitable for the current abnormal video can be determined, the spatial similarity and the time sequence similarity are comprehensively considered, and the detection accuracy of the abnormal event is further improved.

Alternatively, in a video abnormal event detection method shown in fig. 1, the calculation formula of the abnormal score of each abnormal video group is as follows:

In the embodiment of the present invention, in order to determine the respective weight values of the spatial variation characteristic and the temporal variation characteristic in the fusion process, an optimal target weight value is finally determined by the above-described manner of calculating the abnormality score of the abnormal video. Specifically, when the first weight value is determined, the score of each image in the abnormal video in the group may be output through an activation function (e.g., sigmoid function), and in order to further highlight the abnormal region and reduce the over-smooth influence caused by fusion, weight distribution may be implemented on each image through two layers of attention network layers and a softmax function, so as to obtain the abnormal score corresponding to the abnormal video.

Optionally, in the method for detecting video abnormal events shown in fig. 1, after determining that there is an abnormal event in the video to be detected, the method further includes:

acquiring a target image to be detected corresponding to the determined time;

In the embodiment of the present invention, after determining that an abnormal event exists in the video to be detected in step 107, the time sequence change characteristics of the video to be detected are analyzed to determine the time when the abnormal event occurs, and at the same time, a target image to be detected corresponding to the determined time is acquired, and the target spatial similarity between the target image to be detected and an adjacent image to be detected is determined, so as to determine the spatial information of the abnormal event according to the target spatial similarity, thereby realizing the detection and positioning of the boundary of the abnormal event.

In order to more clearly illustrate the technical solution and advantages of the present invention, as shown in fig. 2, the following describes in detail a video abnormal event detection method provided by an embodiment of the present invention, which specifically includes:

step 201: and constructing a detection model.

In particular, the amount of the solvent to be used,

a1, acquiring at least two groups of historical abnormal-free videos; each group of historical abnormal-free videos comprises at least two images which are arranged according to a time sequence;

a2, acquiring at least two groups of abnormal videos, and obtaining fusion change characteristics of the at least two groups of abnormal videos, wherein the fusion change characteristics comprise:

fusing the spatial variation characteristics and the time sequence variation characteristics of each group of abnormal videos to obtain fusion variation characteristics corresponding to the group of abnormal videos; the fusion change characteristics are used for representing the space and time sequence rules of at least two images which are not arranged according to the time sequence;

a3, extracting the characteristics of at least two groups of historical abnormal-free videos to obtain the fusion change characteristics corresponding to each group of historical abnormal-free videos; the fusion change characteristics are used for representing the space and time sequence rules of the historical abnormal-free videos arranged according to the time sequence;

a4, clustering fusion change characteristics of each group of historical abnormal-free videos by using an unsupervised clustering algorithm to obtain at least two clustering clusters;

a5, combining the fusion change characteristics of at least two groups of historical abnormal videos and the fusion change characteristics of at least two groups of abnormal videos to obtain a training set;

and A6, training by using a training set according to at least two clustering clusters and a multi-classification support vector machine to obtain a detection model.

More specifically, the fusion change features in the above steps are obtained according to the methods in step 203, step 204, and step 205.

Step 202: and obtaining at least two images to be detected which are arranged according to the time sequence by utilizing the video to be detected.

Specifically, a video to be detected is obtained, and the video to be detected is subjected to framing processing to obtain at least two images to be detected which are arranged according to a time sequence.

Step 203: and determining the spatial variation characteristics of the video to be detected.

Specifically, spatial feature extraction is carried out on at least two images to be detected, and spatial feature information corresponding to each image to be detected is obtained; the image to be detected comprises at least two space points;

for each spatial point of each image to be detected, executing:

arranging the spatial similarity between each image to be detected and a first image to be detected adjacent to the image to be detected according to a time sequence to generate spatial variation characteristics of the video to be detected; wherein, the time sequence is the arrangement sequence of at least two images to be detected; the spatial variation characteristics comprise a variation rule of the spatial similarity according to time sequence.

Step 204: and determining the time sequence change characteristics of the video to be detected.

Specifically, determining the time sequence similarity between adjacent images to be detected to obtain the time sequence change characteristics of the video to be detected, including:

aiming at each image to be detected, executing the following steps:

screening pixel points in the difference image to determine a motion group;

arranging the time sequence similarity between each image to be detected and a first image to be detected adjacent to the image to be detected according to a time sequence to generate a time sequence change characteristic of the video to be detected; wherein, the time sequence is the arrangement sequence of at least two images to be detected; the time sequence change characteristics comprise a change rule of the time sequence similarity according to the time sequence.

Step 205: and fusing the space variation characteristic and the time sequence variation characteristic to obtain a fusion variation characteristic.

Specifically, for each group of abnormal videos, the following steps are performed:

if so, determining that the first weight value is a target weight value, and fusing the space variation characteristic and the time sequence variation characteristic by using the target weight value to obtain a fusion variation characteristic;

the calculation formula of the abnormal score of each abnormal video group is as follows:

Step 206: and judging whether the fusion change characteristics are abnormal or not by using a pre-trained detection model.

Specifically, the obtained fusion change features are input into the detection model obtained in the training in step 201, and the detection result of the video to be detected corresponding to the fusion change features output by the detection model is obtained.

Step 207: after determining that the video to be detected has the abnormal event, determining the spatial information of the abnormal event.

Specifically, after the detection result indicates that an abnormal event exists in the video to be detected, analyzing the time sequence change characteristics to determine the time of the abnormal event;

acquiring a target image to be detected corresponding to the determined time;

Fig. 3 shows a video abnormal event detection chip according to an embodiment of the present invention. Taking a software implementation as an example, as shown in fig. 3, as a chip in a logical sense, a CPU of a device in which the chip is located reads corresponding computer program instructions in a nonvolatile memory into a memory to run. The video abnormal event detection chip provided by the embodiment comprises:

the acquisition module 301 is configured to acquire a video to be detected, and obtain at least two images to be detected arranged according to a time sequence by using the video to be detected;

the first feature determining module 302 is configured to determine spatial similarity between adjacent images to be detected, so as to obtain a spatial variation feature of a video to be detected; the space change characteristics comprise a change rule of the space similarity according to a time sequence;

the second characteristic determining module 303 is configured to determine a time sequence similarity between adjacent images to be detected, so as to obtain a time sequence change characteristic of the video to be detected; the time sequence change characteristics comprise a change rule of the time sequence similarity according to a time sequence;

a fusion module 304, configured to fuse the spatial variation feature obtained by the first feature determination module and the time-sequence variation feature obtained by the second feature determination module to obtain a fusion variation feature;

the detection module 305 is configured to determine whether the fusion change feature obtained by the fusion module is abnormal by using a pre-trained detection model, and if so, determine that an abnormal event exists in the video to be detected.

Optionally, on the basis of a video abnormal event detection chip shown in fig. 3, the first feature determining module 302 is further configured to perform the following operations:

for each spatial point of each image to be detected, executing:

Optionally, on the basis of a video abnormal event detection chip shown in fig. 3, the second characteristic determining module 303 is further configured to perform the following operations:

aiming at each image to be detected, executing the following steps:

screening pixel points in the difference image to determine a motion group;

Optionally, on the basis of a video abnormal event detection chip shown in fig. 3, the chip further includes: a creation module to perform the following operations:

acquiring at least two groups of historical abnormal-free videos; each group of historical abnormal-free videos comprises at least two images;

performing feature extraction on at least two groups of historical abnormal-free videos to obtain spatial variation features and time sequence variation features corresponding to each group of historical abnormal-free videos;

fusing the spatial variation characteristic and the time sequence variation characteristic of each group of historical abnormal-free videos to obtain a fused variation characteristic corresponding to the group of historical abnormal-free videos; the fusion change characteristics are used for representing the space and time sequence rules of the historical abnormal-free videos arranged according to the time sequence;

Optionally, on the basis of the video abnormal event detection chip shown in fig. 3, the creating module further includes: an anomaly video construction sub-module for performing the following operations:

Optionally, on the basis of the video abnormal event detection chip shown in fig. 3, the fusion change feature of each group of abnormal videos includes a target spatial change feature and a target temporal change feature;

the fusion module 304 is further configured to perform the following operations:

for each set of abnormal videos, performing:

Optionally, on the basis of the video abnormal event detection chip shown in fig. 3, the fusion module 304 is further configured to perform the following operations:

Optionally, on the basis of a video abnormal event detection chip shown in fig. 3, the chip further includes: a positioning module to perform the following operations:

acquiring a target image to be detected corresponding to the determined time;

It is to be understood that the illustrated structure of the embodiment of the present invention does not constitute a specific limitation to a video abnormal event detection chip. In other embodiments of the present invention, a video anomaly detection chip may include more or fewer components than shown, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Since the contents of information interaction, execution process, and the like between the modules in the chip are based on the same concept as the method embodiment of the present invention, specific contents may refer to the description in the method embodiment of the present invention, and are not described herein again.

The embodiment of the invention also provides a video abnormal event detection device, which comprises: at least one memory area and at least one processor;

the at least one memory to store a machine readable program;

the at least one processor is configured to invoke the machine readable program to perform a video exceptional event detection method according to any one of the embodiments of the present invention.

In the embodiment of the present invention, as shown in fig. 4, a hardware structure diagram of a device in which a video abnormal event detection apparatus provided in the embodiment of the present invention is located is shown, except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 4, the device in which the apparatus is located in the embodiment may also generally include other hardware, such as a forwarding chip responsible for processing a packet, and the like.

An embodiment of the present invention further provides a computer-readable medium, where a computer instruction is stored on the computer-readable medium, and when the computer instruction is executed by a processor, the processor is caused to execute a video abnormal event detection method according to any embodiment of the present invention.

Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.

In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.

Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.

Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.

Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion module connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion module to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a" does not exclude the presence of other similar elements in a process, method, article, or apparatus that comprises the element.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A video abnormal event detection method is characterized by comprising the following steps:

acquiring a video to be detected;

if so, determining that an abnormal event exists in the video to be detected;

the method for creating the detection model comprises the following steps:

2. The method according to claim 1, wherein the determining the spatial similarity between the adjacent images to be detected to obtain the spatial variation characteristic of the video to be detected comprises:

for each spatial point of each image to be detected, executing:

3. The method according to claim 1, wherein the determining the time sequence similarity between the adjacent images to be detected to obtain the time sequence variation characteristics of the video to be detected comprises:

aiming at each image to be detected, executing the following steps:

screening pixel points in the differential image to determine a motion group;

4. The method according to claim 1, wherein the obtaining at least two groups of abnormal videos and obtaining the fusion change characteristics of the at least two groups of abnormal videos comprises:

5. The method according to claim 1, wherein the fusion variation features of each abnormal video set include a target spatial variation feature and a target temporal variation feature;

for each set of abnormal videos, performing:

6. The method of claim 5,

7. The method according to any one of claims 1 to 6, further comprising, after the determining that there is an abnormal event in the video to be detected:

acquiring a target image to be detected corresponding to the determined time;

8. A video exceptional event detecting apparatus, comprising:

the detection module is used for judging whether the fusion change characteristics obtained by the fusion module are abnormal or not by utilizing a pre-trained detection model, and if so, determining that an abnormal event exists in the video to be detected;

a creation module to perform the following operations:

9. A video exceptional event detecting apparatus, comprising: at least one memory and at least one processor;

the at least one memory to store a machine readable program;

the at least one processor, configured to invoke the machine readable program to perform the method of any of claims 1 to 7.