WO2021174771A1 - 一种人机协作的视频异常检测方法 - Google Patents

一种人机协作的视频异常检测方法 Download PDF

Info

Publication number
WO2021174771A1
WO2021174771A1 PCT/CN2020/110579 CN2020110579W WO2021174771A1 WO 2021174771 A1 WO2021174771 A1 WO 2021174771A1 CN 2020110579 W CN2020110579 W CN 2020110579W WO 2021174771 A1 WO2021174771 A1 WO 2021174771A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
abnormal
video frame
normal
frame
Prior art date
Application number
PCT/CN2020/110579
Other languages
English (en)
French (fr)
Inventor
於志文
杨帆
李青洋
郭斌
Original Assignee
西北工业大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 西北工业大学 filed Critical 西北工业大学
Publication of WO2021174771A1 publication Critical patent/WO2021174771A1/zh
Priority to US17/727,728 priority Critical patent/US11983919B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/732Query formulation
    • G06F16/7328Query by example, e.g. a complete video frame or video sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • G06V10/7784Active pattern-learning, e.g. online learning of image or video features based on feedback from supervisors
    • G06V10/7788Active pattern-learning, e.g. online learning of image or video features based on feedback from supervisors the supervisor being a human, e.g. interactive learning with a human teacher
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • G06F18/2178Validation; Performance evaluation; Active pattern learning techniques based on feedback of a supervisor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/72Data preparation, e.g. statistical preprocessing of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/44Event detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • G06V20/47Detecting features for summarising video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the invention belongs to the technical field of video anomaly detection, and particularly relates to a video anomaly detection method involving man-machine cooperation.
  • anomaly detection there are mainly two types of anomaly detection, one is based on the early traditional manual feature extraction descriptors, which are used to detect anomalies in specific scenes according to specific target requirements, the detection performance and the quality of manual feature extraction The relationship is close; the other is a method based on deep learning after 2012, which learns more abundant video frames through neural network models and people cannot estimate some hidden features, thereby greatly improving the accuracy and speed of anomaly detection.
  • the detection accuracy is improving
  • training the detection model requires a large number of samples for training, and the test results of various models have considerable false alarms.
  • it is necessary to continuously adjust the training model which is time-consuming and labor-intensive.
  • the demand cannot be well met.
  • the existing video anomaly detection methods are based on data distribution, model parameters, sample selection and other aspects. For objects that some people can easily identify, the designed model needs to be iterated and optimized to be able to Improve the detection (recognition) effect.
  • the present invention proposes a human-machine collaboration video anomaly detection method.
  • Step 1 For the video sequence to be detected, analyze its video parameters: the length of the video, the scene of the video, the start and end range of the abnormal video, and the abnormal video is agreed; the video is divided into frames and divided into a certain length of video sequence;
  • Step 2 Divide the video sequence segmented in step 1 into a training set and a test set, where the training set does not include any abnormal video sequences, and the test set includes normal and abnormal video sequences;
  • Step 3 Use the autoencoder model to train the training set data, adjust the model parameters within a certain time window, and block the video frames and optical flow data input to the network, and then pass the convolution and pooling of the encoder.
  • the deconvolution and pooling operations of the decoder use the Euclidean loss with L2 regularization shown in equation (1) as the objective function of the rectangular parallelepiped formed by multiple video frames in the time dimension, which represents the video sequence video block f rec (X i) after the input of N blocks and the reconstructed video block made Euclidean distance X i, where ⁇ represents the adjustment factor before and after the two sum terms, W is a neural network learning from the encoder to The weight of; optimize the objective function to obtain the training model;
  • Step 4 Calculate the total error of each pixel value I in frame t at position (x, y)
  • the reconstruction error of each pixel at position (x, y) is expressed by formula (2):
  • I (x, y, t) represents the value of a pixel value I at the position (x, y) in each frame t
  • f W I (x, y, t) represents the pixel value after reconstruction
  • min t e(t) and max t e(t) represent the total error value corresponding to the video frame with the smallest score and the largest score in the video; according to the overall detection result and the ratio of normal and abnormal, the threshold is set to be less than
  • the threshold value is a normal video frame, and the threshold value is an abnormal video frame; for the detection result, a feedback is initiated with a certain probability, so that people can judge whether it is true normal or true abnormality. If it is a normal video frame, it will be output directly, if it is a detection error
  • the video frame is marked by people;
  • Step 5 Collect the wrong video frames in step 4 and store them in a buffer. After the collected video frames reach a certain number, send the collected video frames to the autoencoder model, and make the model parameters appropriate Adjust to improve the detection accuracy of similar video frames in subsequent tests.
  • the ratio of the training set and the test set in the step 2 is 4:6.
  • the block in step 3 has three sizes of 15*15 pixels, 18*18 pixels, or 20*20 pixels.
  • the certain probability in step 4 is 0.1.
  • the present invention adds human feedback to conventional video anomaly detection, and expert confirmation is made for the video frame that initiates the feedback, especially for the judgment of the video frame larger than the set threshold.
  • the abnormal target object has a large occlusion.
  • Experts can confirm that using human cognitive advantages to modify and label the results of algorithm detection, for false alarms (which were originally normal but were judged as abnormal by the algorithm) and missed detections (which were originally abnormal but were not detected by the algorithm). Detected) can be corrected, and the final experimental result improves the detection accuracy, without the need to update the detection model, which has practical application value.
  • the invention provides a video abnormality detection method fused with human feedback.
  • This method combines the abnormal natural cognition of people (with domain expertise) and the processing results of the machine learning model to a certain extent.
  • Set a threshold for the test results send a feedback request in a certain proportion, confirm the correct detection, and directly output the result; mark the detection error, and then return to the input part of the model to process the marked data .
  • the previous abnormal video detection algorithm provides a novel way, which combines the advantages of human cognitive analysis and the rapid processing advantages of neural networks, and improves the accuracy of detection.
  • Figure 1 is a flow chart of a video anomaly detection method based on human-machine collaboration of the present invention
  • FIG. 2 shows the result of whether there are any abnormalities in the video
  • the present invention proposes a video anomaly detection method based on man-machine cooperation.
  • Video frames and traditional image optical flow descriptors are used as input data to perform self-encoder neural network encoding and conversion into hidden layer representation content, and then the hidden layer representation content is decoded and reconstructed to output.
  • the final reconstruction result maintains a high similarity with the input sample; on the contrary, if the input is an abnormal sample, the final reconstruction error will deviate from the input sample Larger.
  • an appropriate threshold value is set for the test result, the value smaller than the threshold value is regarded as normal, and the value larger than the threshold value is regarded as abnormal.
  • the person judges the video frame that initiated the feedback. If it is detected correctly, it will be output directly. If there is a detection error, it will be marked. The normal mark is 1, the abnormal mark is 0, and then the sample with the detection error is returned. To the model input. By collecting a certain number of error detection video frames and sending them to the neural network, the model is updated, and then in subsequent tests, some similar anomalies can be detected as real anomalies. At the same time, for abnormal videos, the detection can be more targeted according to the start and end ranges of the video abnormalities, and the speed of detection can be accelerated, which has strong practical significance in application scenarios such as public safety and social security management.
  • Step 1 Analyze the video parameters of the video sequence to be detected, prepare for the processing of the video to be detected, have a basic understanding of the video to be processed, and deal with it more specifically.
  • the observation record includes the length of the video, the scene of the video, the start and end range of the abnormal video, and determine the abnormality of the video (in our experimental data set: car, skateboarder, bicycle rider, wheelchair, running person , The person throwing things), so as to have a clearer perception of the video to be detected.
  • Do some preprocessing perform frame division operations on the video, and divide it into a certain length of video sequence (such as 200 frames as a sequence).
  • Step 2 Divide the ratio of the training set and the test set according to the video sequence segmented in Step 1, usually 4:6, where the training set does not include any abnormal video sequences, and the test set includes normal and abnormal video sequences.
  • REC (X i) and the input video block X i made Euclidean distance, wherein the longitudinal adjustment factor ⁇ represents the two sum terms, W is a neural network learning from the encoder to the weight. Optimize the objective function to obtain the training model.
  • Step 4 After the model is trained, we calculate the total error value of each pixel value I in the frame t at the position (x, y) The reconstruction error of each pixel at the position (x, y) is expressed by formula (2), and then the anomaly score of each frame is calculated, which is used as a basis for judging whether it is an abnormality.
  • I (x, y, t) represents the value of a pixel value I at the position (x, y) in each frame t
  • f W I (x, y, t) represents the pixel value after reconstruction.
  • Anomaly score is obtained for each frame, expressed as formula (3)
  • min t e(t) and max t e(t) represent the total error value corresponding to the video frame with the smallest score and the largest score in the video sequence.
  • set the threshold less than the threshold is a normal video frame, and greater than the threshold is an abnormal video frame; feedback is initiated with a certain probability (0.1) for the detection result, and let people (experts) make judgments Whether it is true normal or true abnormality, if it is a normal video frame that is detected, it is directly output, if it is a video frame that is detected incorrectly, people will mark it; for a video sequence composed of regular events, it has a better rule (normal) score, because They are closer to the normal training data in the training set in the feature space. Conversely, the abnormal sequence has a low normal score, so it can be used to locate abnormalities.
  • Step 5 Collect the wrong video frames in step 4 and store them in a buffer. After the collected video frames reach a certain number, send the collected video frames to the autoencoder model, and make the model parameters appropriate Adjust to improve the detection accuracy of similar video frames in subsequent tests.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Library & Information Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

一种人机协作的视频异常检测方法,利用视频帧和传统的图像光流描述符作为输入数据进行自编码器神经网络编码转换成隐层表示内容,再把隐层表示内容通过解码重构输出。使用正常样本对自编码器网络进行训练,在测试阶段,若输入是正常的样本,最后重构结果和输入样本保持高相似度;反之,若输入是非正常样本,最后重构误差对于输入样本偏差较大。根据重构误差,对测试的结果设定适当的阈值,小于阈值认为是正常,大于阈值的认为是异常。然后以一定概率请求反馈,人对发起反馈的视频帧进行判断,如果是正确检测直接输出,如果出现检测错误,则打上标记,正常标记为1,异常标记为0,然后把检测错误的样本返回到模型输入。

Description

一种人机协作的视频异常检测方法 技术领域
本发明属于视频异常检测技术领域,尤其是涉及人机协作的视频异常检测方法。
背景技术
随着信息技术和物联网技术的快速发展,越来越多的监控设备部署在城镇和道路上(例如:小区楼宇、商场、办公楼和街道以及高速公路区域等)。大量监控设施的部署为公共财产和人身安全提供了隐形保障,与此同时,也产生了大量的监控视频数据,如何在庞大的视频数据中快速高效的找到特定需求的视频是许多应用面临的需求。视频异常检测是计算机视觉中一个重要的分支,在理论研究和实际应用中都发挥了重要作用。目前主要有两种异常检测的类型,一种是基于早期传统的手工特征提取的描述符,按照特定的目标需求,用于对特定场景的异常进行检测,检测的性能和手工特征的提取的质量关系密切;另一种是2012年以后基于深度学习的方法,通过神经网络模型对视频帧学习到更丰富的以及人们无法估计一些隐藏特征,进而大幅提高了异常检测精度和速度。
在目前的视频异常检测方法中,虽然检测精确度在提升,但是训练检测模型需要大量的样本进行训练,各种模型的测试结果中都有相当规模的假警报。为了提高检测的准确度,就需要不断调整训练模型,耗时耗力,在一些实时性要求较高的任务中,不能很好的满足需求。此外,现有的视频异常检测方法都是依据数据分布、模型参数、样本选择等方面去研究,对于某些人很轻松就可以识别的对象,设计的模型则需要不断迭代,优化模型,从而才能提升检测(识别)效果。
发明内容
要解决的技术问题
为了避免现有技术的不足之处,提升检测精度,本发明提出一种人机协作的视频异常检测方法。
技术方案
一种人机协作的视频异常检测方法,其特征在于步骤如下:
步骤1:对于要检测的视频序列,分析其视频参数:视频的长度、视频的场景、异常视频的起止范围,约定异常的视频;对视频进行分帧操作,分割成一定长度的视频序列;
步骤2:将步骤1分割好的视频序列划分为训练集和测试集,其中训练集不包括任何异常的视频序列,测试集包含正常和异常视频序列;
步骤3:利用自编码器模型对训练集数据进行训练,在一定的时间窗口内调整模型参数,让输入到网络的视频帧和光流数据进行分块,然后经过编码器的卷积、池化,以及解码器的反卷积、池化操作;使用式(1)所示的带有L2正则化的欧几里德损失作为在时间维度多个视频帧构成的长方体的目标函数,它表示视频序列中N块重构之后的视频块f rec(X i)和输入的视频块X i所做的欧式距离,其中γ表示前后两个加和项的调节因子,W是自编码器神经网络学习到的权重;优化目标函数,从而得到训练模型;
Figure PCTCN2020110579-appb-000001
步骤4:计算帧t中每一个像素值I在位置(x,y)处的总计误差
Figure PCTCN2020110579-appb-000002
每一个像素在(x,y)位置的重构误差用公式(2)表示:
Figure PCTCN2020110579-appb-000003
其中,I(x,y,t)表示每一帧t中一个像素值I在位置(x,y)位置的值,f WI(x,y,t)表示重构之后的像素值;
计算每一帧的异常分数,用于是否为异常的判断依据:
Figure PCTCN2020110579-appb-000004
其中,min te(t)和max te(t)表示在视频中得分最小和得分最大的视频帧对应的总计误差值;根据整体的检测结果和正常以及异常的比例,设定阈值,小于阈值为正常视频帧,大于阈值为异常视频帧;对于检测的结果以一定概率发起反馈,让人进行判断是否是真实的正常或真实异常,如果是检测正常的视频帧直接输出,如果是检测错误的视频帧,人进行标注;
步骤5:对于步骤4中检测错误的视频帧,进行收集,存放到一个缓冲区,等到收集的视频帧达到一定数量后,把收集到的视频帧送入自编码器模型,模型参数做出适度调整,从而在后续的测试中,提升对类似视频帧的检测准确率。
所述的步骤2中的训练集和测试集的比例为4∶6。
所述的步骤3中的分块为15*15像素、18*18像素或20*20像素三种尺寸。
所述的步骤4中的一定概率为0.1。
有益效果
本发明是对常规的视频异常检测加入了人的反馈,对于发起反馈的视频帧,进行专家确认,尤其是对于大于设定阈值的视频帧的判断,在视频中,异常目标物体存在较大遮挡时,专家可以确认,利用人的认知优势,对算法检测的结果进行修改和标注,对于假警报(本来是正常的,却被算法判别为异常)和漏检(本来是异常,却没有被检测出来),可以做修正,最终实验结果提高了检测精度,而不需要更新检测模型,有实际的应用价值。
在当下每天都有大量的图片和视频数据产生的时代,如果能够融合人的认知、分析、推理能力,对异常视频做一定量的标注,结合机器学习的算法,实现高效快捷的检测效果。本发明提供了一种融合人反馈的视频异常检测方法。该方法将人(具有领域专业知识)对异常的自然认知与机器学习模型的处理结果进行一定的融合。对于测试的结果设定一个阈值,以一定比例发出反馈请求,对于检测正确的进行人的确认,直接输出结果;对于检测错误的进行标注,然后返回模型的输入部分,对带标记的数据进行处理。在这个处理模型中,对于以往的异常视频检测算法提供了一种新颖的方式,融合了人的认知分析优势和神经网络的快速处理优势,提升了检测的准确性。
附图说明
图1是本发明一种人机协作的视频异常检测方法流程图
图2为视频中有无异常的结果图
具体实施方式
现结合实施例、附图对本发明作进一步描述:
本发明提出了一种人机协作的视频异常检测方法。利用视频帧和传统的图像光流描述符作为输入数据进行自编码器神经网络编码转换成隐层表示内容,再把隐层表示内容通过解码重构输出。使用正常样本对自编码器网络进行训练,在测试阶段,若输入是正常的样本,最后重构结果和输入样本保持高相似度;反之,若输入是非正常样本,最后重构误差对于输入样本偏差较大。根据重构误差,对测试的结果设定适当的阈值,小于阈值认为是正常,大于阈值的认为是异常。然后以一定概率请求反馈,人对发起反馈的视频帧进行判断,如果是正确检测直接输出,如果出现检测错误,则打上标记,正常标记为1,异常标记为0,然后把检测错误的样本返回到模型输入。通过收集一定数量的错误检测视频帧,送入神经网络,进行模型更新,进而在后续的测试中,可以检测出部分类似的异常为真实异常。同时,对于异常视频,可以根据视频异常的起止范围,更加有针对性的进行检测,加快检测的速度,在公共安全与社会治安管理等应用场景中有很强的现实意义。
如图1所示,包括以下步骤:
步骤1:对于要检测的视频序列,分析其视频参数,为待检测视频处理做准备,对要处理的视频有基本的信息了解,更有针对性的去处理。,观察记录包括视频的长度,视频的场景,异常视频的起止范围,确定视频的异常(在我们实验的数据集中是:小汽车、玩滑板的人、骑自行车的人、轮椅、跑动的人,扔东西的人),从而对待检测的视频有更清晰的认知。做一些预处理,对视频进行分帧操作,分割成一定长度的视频序列(如200帧为一个序列)。
步骤2:根据步骤1分割好的视频序列,划分训练集和测试集的比例,通常为4:6,其中训练集不包括任何异常的视频序列,测试集包含正常和异常视频序列。
步骤3:利用自编码器模型对训练集数据进行训练,在一定的时间窗口内(N=10帧或N=20帧)调整模型参数,让输入到网络的视频帧和光流数据进行分块,15*15像素、18*18像素、20*20像素三种尺寸,然后经过编码器的卷积、池化,以及解码器的反卷积、池化操作。我们使用带有L2正则化的欧几里德损失作为在时间维度多个视频帧构成的长方体的目标函数,如(1)式所示,它表示视频序列中N块重构之后的视频块f rec(X i)和输入的视频块X i所做的欧式距离,其中γ表示前后两个加和项的调节因子,W是自编码器神经网络学习到的权重。优化目标函数,从而得到训练模型。
Figure PCTCN2020110579-appb-000005
步骤4:当模型训练好之后,我们计算帧t中每一个像素值I在位置(x,y)处的总计误差值
Figure PCTCN2020110579-appb-000006
每一个像素在(x,y)位置的重构误差用公式(2)表示,进而计算每一帧的异常分数,用于是否为异常的判断依据。
Figure PCTCN2020110579-appb-000007
其中,I(x,y,t)表示每一帧t中一个像素值I在位置(x,y)位置的值,f WI(x,y,t)表示重构之后的像素值。对每一帧得出异常分数,表示为公式(3)式
Figure PCTCN2020110579-appb-000008
其中,min te(t)和max te(t)表示在视频序列中得分最小和得分最大的视频帧对应的总计误差值。根据整体的检测结果和正常以及异常的比例,设定阈值,小于阈值为正常视频帧,大于阈值为异常视频帧;对于检测的结果以一定概率(0.1)发起反馈,让人(专家)进行判断是否是真实的正常或真实异常,如果是检测正常的视频帧直接输出,如果是检测错误的视频帧,人进行标注;对于由规则事件组成的视频序列具有更好的规则(正常)得分,因为它们在特征空间更接近训练集里的正常训练数据。反之,异常序列具有较低的正常得分,因此它可以用来定位异常。
步骤5:对于步骤4中检测错误的视频帧,进行收集,存放到一个缓冲区,等到收集的视频帧达到一定数量后,把收集到的视频帧送入自编码器模型,模型参数做出适度调整,从而在后续的测试中,提升对类似视频帧的检测准确率。

Claims (4)

  1. 一种人机协作的视频异常检测方法,其特征在于步骤如下:
    步骤1:对于要检测的视频序列,分析其视频参数:视频的长度、视频的场景、异常视频的起止范围,约定异常的视频;对视频进行分帧操作,分割成一定长度的视频序列;
    步骤2:将步骤1分割好的视频序列划分为训练集和测试集,其中训练集不包括任何异常的视频序列,测试集包含正常和异常视频序列;
    步骤3:利用自编码器模型对训练集数据进行训练,在一定的时间窗口内调整模型参数,让输入到网络的视频帧和光流数据进行分块,然后经过编码器的卷积、池化,以及解码器的反卷积、池化操作;使用式(1)所示的带有L2正则化的欧几里德损失作为在时间维度多个视频帧构成的长方体的目标函数,它表示视频序列中N块重构之后的视频块f rec(X i)和输入的视频块X i所做的欧式距离,其中γ表示前后两个加和项的调节因子,W是自编码器神经网络学习到的权重;优化目标函数,从而得到训练模型;
    Figure PCTCN2020110579-appb-100001
    步骤4:计算帧t中每一个像素值I在位置(x,y)处的总计误差值
    Figure PCTCN2020110579-appb-100002
    每一个像素在(x,y)位置的重构误差用公式(2)表示:
    Figure PCTCN2020110579-appb-100003
    其中,I(x,y,t)表示每一帧t中一个像素值I在位置(x,y)位置的值,f W(I(x,y,t)表示重构之后的像素值;
    计算每一帧的异常分数,用于是否为异常的判断依据:
    Figure PCTCN2020110579-appb-100004
    其中,min te(t)和max te(t)表示在视频中得分最小和得分最大的视频帧对应的总计误差值;根据整体的检测结果和正常以及异常的比例,设定阈值,小于阈值为正常视频帧,大于阈值为异常视频帧;对于检测的结果以一定概率发起反馈,让人进行判断是否是真实的正常或真实异常,如果是检测正常的视频帧直接输出,如果是检测错误的视频帧,人进行标注;
    步骤5:对于步骤4中检测错误的视频帧,进行收集,存放到一个缓冲区,等到收集的视频帧达到一定数量后,把收集到的视频帧送入自编码器模型,模型参数做出适度调整,从而在后续的测试中,提升对类似视频帧的检测准确率。
  2. 根据权利要求1所述的一种人机协作的视频异常检测方法,其特征在于所述的步骤2中的训练集和测试集的比例为4:6。
  3. 根据权利要求1所述的一种人机协作的视频异常检测方法,其特征在于所述的步骤3中的分块为15*15像素、18*18像素或20*20像素三种尺寸。
  4. 根据权利要求1所述的一种人机协作的视频异常检测方法,其特征在于所述的步骤4中的一定概率为0.1。
PCT/CN2020/110579 2020-03-05 2020-08-21 一种人机协作的视频异常检测方法 WO2021174771A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/727,728 US11983919B2 (en) 2020-03-05 2022-04-23 Video anomaly detection method based on human-machine cooperation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010148420.XA CN111400547B (zh) 2020-03-05 2020-03-05 一种人机协作的视频异常检测方法
CN202010148420.X 2020-03-05

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/727,728 Continuation US11983919B2 (en) 2020-03-05 2022-04-23 Video anomaly detection method based on human-machine cooperation

Publications (1)

Publication Number Publication Date
WO2021174771A1 true WO2021174771A1 (zh) 2021-09-10

Family

ID=71428571

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/110579 WO2021174771A1 (zh) 2020-03-05 2020-08-21 一种人机协作的视频异常检测方法

Country Status (2)

Country Link
CN (1) CN111400547B (zh)
WO (1) WO2021174771A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114067251A (zh) * 2021-11-18 2022-02-18 西安交通大学 一种无监督监控视频预测帧异常检测方法
CN114092478A (zh) * 2022-01-21 2022-02-25 合肥中科类脑智能技术有限公司 一种异常检测方法
CN114743153A (zh) * 2022-06-10 2022-07-12 北京航空航天大学杭州创新研究院 基于视频理解的无感取菜模型建立、取菜方法及装置
CN114842371A (zh) * 2022-03-30 2022-08-02 西北工业大学 一种无监督视频异常检测方法
CN115484456A (zh) * 2022-09-15 2022-12-16 重庆邮电大学 一种基于语义聚类的视频异常预测方法及装置
CN117474925A (zh) * 2023-12-28 2024-01-30 山东润通齿轮集团有限公司 一种基于机器视觉的齿轮点蚀检测方法及系统

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400547B (zh) * 2020-03-05 2023-03-24 西北工业大学 一种人机协作的视频异常检测方法
CN113033424B (zh) * 2021-03-29 2021-09-28 广东众聚人工智能科技有限公司 一种基于多分支视频异常检测方法和系统
CN113240022A (zh) * 2021-05-19 2021-08-10 燕山大学 多尺度单分类卷积网络的风电齿轮箱故障检测方法
CN113473124B (zh) * 2021-05-28 2024-02-06 北京达佳互联信息技术有限公司 信息获取方法、装置、电子设备及存储介质
CN115082870A (zh) * 2022-07-18 2022-09-20 松立控股集团股份有限公司 一种停车场异常事件检测方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559420A (zh) * 2013-11-20 2014-02-05 苏州大学 一种异常检测训练集的构建方法及装置
CN108509827A (zh) * 2017-02-27 2018-09-07 阿里巴巴集团控股有限公司 视频流中异常内容的识别方法及视频流处理系统和方法
CN109615019A (zh) * 2018-12-25 2019-04-12 吉林大学 基于时空自动编码器的异常行为检测方法
CN110177108A (zh) * 2019-06-02 2019-08-27 四川虹微技术有限公司 一种异常行为检测方法、装置及验证系统
US20190392230A1 (en) * 2016-06-13 2019-12-26 Xevo Inc. Method and system for providing behavior of vehicle operator using virtuous cycle
CN111400547A (zh) * 2020-03-05 2020-07-10 西北工业大学 一种人机协作的视频异常检测方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108830882B (zh) * 2018-05-25 2022-05-17 中国科学技术大学 视频异常行为实时检测方法
US10970823B2 (en) * 2018-07-06 2021-04-06 Mitsubishi Electric Research Laboratories, Inc. System and method for detecting motion anomalies in video
CN109359519B (zh) * 2018-09-04 2021-12-07 杭州电子科技大学 一种基于深度学习的视频异常行为检测方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559420A (zh) * 2013-11-20 2014-02-05 苏州大学 一种异常检测训练集的构建方法及装置
US20190392230A1 (en) * 2016-06-13 2019-12-26 Xevo Inc. Method and system for providing behavior of vehicle operator using virtuous cycle
CN108509827A (zh) * 2017-02-27 2018-09-07 阿里巴巴集团控股有限公司 视频流中异常内容的识别方法及视频流处理系统和方法
CN109615019A (zh) * 2018-12-25 2019-04-12 吉林大学 基于时空自动编码器的异常行为检测方法
CN110177108A (zh) * 2019-06-02 2019-08-27 四川虹微技术有限公司 一种异常行为检测方法、装置及验证系统
CN111400547A (zh) * 2020-03-05 2020-07-10 西北工业大学 一种人机协作的视频异常检测方法

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114067251A (zh) * 2021-11-18 2022-02-18 西安交通大学 一种无监督监控视频预测帧异常检测方法
CN114067251B (zh) * 2021-11-18 2023-09-15 西安交通大学 一种无监督监控视频预测帧异常检测方法
CN114092478A (zh) * 2022-01-21 2022-02-25 合肥中科类脑智能技术有限公司 一种异常检测方法
CN114842371A (zh) * 2022-03-30 2022-08-02 西北工业大学 一种无监督视频异常检测方法
CN114842371B (zh) * 2022-03-30 2024-02-27 西北工业大学 一种无监督视频异常检测方法
CN114743153A (zh) * 2022-06-10 2022-07-12 北京航空航天大学杭州创新研究院 基于视频理解的无感取菜模型建立、取菜方法及装置
CN115484456A (zh) * 2022-09-15 2022-12-16 重庆邮电大学 一种基于语义聚类的视频异常预测方法及装置
CN115484456B (zh) * 2022-09-15 2024-05-07 重庆邮电大学 一种基于语义聚类的视频异常预测方法及装置
CN117474925A (zh) * 2023-12-28 2024-01-30 山东润通齿轮集团有限公司 一种基于机器视觉的齿轮点蚀检测方法及系统
CN117474925B (zh) * 2023-12-28 2024-03-15 山东润通齿轮集团有限公司 一种基于机器视觉的齿轮点蚀检测方法及系统

Also Published As

Publication number Publication date
CN111400547B (zh) 2023-03-24
US20220245945A1 (en) 2022-08-04
CN111400547A (zh) 2020-07-10

Similar Documents

Publication Publication Date Title
WO2021174771A1 (zh) 一种人机协作的视频异常检测方法
CN109829443B (zh) 基于图像增强与3d卷积神经网络的视频行为识别方法
CN107330920B (zh) 一种基于深度学习的监控视频多目标追踪方法
Yang et al. Spatio-temporal action detection with cascade proposal and location anticipation
CN110717411A (zh) 一种基于深层特征融合的行人重识别方法
TWI441096B (zh) 適用複雜場景的移動偵測方法
WO2018058854A1 (zh) 一种视频的背景去除方法
CN110853074A (zh) 一种利用光流增强目标的视频目标检测网络系统
CN114333070A (zh) 一种基于深度学习的考生异常行为检测方法
CN112561951B (zh) 一种基于帧差绝对误差和sad的运动和亮度检测方法
CN113313037A (zh) 一种基于自注意力机制的生成对抗网络视频异常检测方法
CN113536972A (zh) 一种基于目标域伪标签的自监督跨域人群计数方法
CN112288778B (zh) 一种基于多帧回归深度网络的红外小目标检测方法
CN111310592A (zh) 一种基于场景分析和深度学习的检测方法
Do Attention in crowd counting using the transformer and density map to improve counting result
CN109446938B (zh) 一种基于多序列双投影的黑烟车检测方法
CN113707175A (zh) 基于特征分解分类器与自适应后处理的声学事件检测系统
CN112949451A (zh) 通过模态感知特征学习的跨模态目标跟踪方法及系统
CN112766179A (zh) 一种基于运动特征混合深度网络的火灾烟雾检测方法
CN114120076B (zh) 基于步态运动估计的跨视角视频步态识别方法
CN115830541A (zh) 一种基于双流时空自编码器的视频异常事件检测方法
CN114758285A (zh) 基于锚自由和长时注意力感知的视频交互动作检测方法
CN114821772A (zh) 一种基于时空关联学习的弱监督时序动作检测方法
CN115457620A (zh) 用户表情识别方法、装置、计算机设备及存储介质
CN114694090A (zh) 一种基于改进PBAS算法与YOLOv5的校园异常行为检测方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20923530

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20923530

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27.03.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20923530

Country of ref document: EP

Kind code of ref document: A1