WO2022116322A1 - 异常检测模型生成方法和装置、异常事件检测方法和装置 - Google Patents

异常检测模型生成方法和装置、异常事件检测方法和装置 Download PDF

Info

Publication number
WO2022116322A1
WO2022116322A1 PCT/CN2020/139499 CN2020139499W WO2022116322A1 WO 2022116322 A1 WO2022116322 A1 WO 2022116322A1 CN 2020139499 W CN2020139499 W CN 2020139499W WO 2022116322 A1 WO2022116322 A1 WO 2022116322A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
frame
training
anomaly detection
feature information
Prior art date
Application number
PCT/CN2020/139499
Other languages
English (en)
French (fr)
Inventor
吴俊�
陈晓蝶
马永康
曾铮
江文涛
Original Assignee
罗普特科技集团股份有限公司
罗普特(厦门)系统集成有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 罗普特科技集团股份有限公司, 罗普特(厦门)系统集成有限公司 filed Critical 罗普特科技集团股份有限公司
Publication of WO2022116322A1 publication Critical patent/WO2022116322A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Definitions

  • the embodiments of the present application relate to the field of computer technologies, and in particular, to a method and device for generating an abnormality detection model, and a method and device for detecting abnormal events.
  • Anomaly detection is a common application of machine learning algorithms. Let a system learn some normal features from many unlabeled data, so as to be able to diagnose abnormal data, we call this process anomaly detection. The so-called anomaly detection is to find objects that are different from most objects, in fact, to find outliers. Anomaly detection is defined in different fields. Anomaly detection in video refers to identifying events that do not match expected behavior and distinguishing between normal and abnormal events.
  • the purpose of the embodiments of the present application is to propose an improved method and apparatus for generating an anomaly detection model to solve the technical problems mentioned in the above background technology section.
  • an embodiment of the present application provides a method for generating an anomaly detection model, the method includes: acquiring a plurality of sample image frame sequences, wherein each sample image frame sequence includes a first image and a second image, and the first image frame sequence includes a first image and a second image.
  • the second image is the next frame of the first image; based on the first image and the second image, the prediction frame generator included in the initial model is trained, wherein the prediction frame generator includes a multi-level feature extraction network and a generation network, The feature extraction network is used to extract the feature information of different depths of the first image and fuse the feature information, and the generation network is used to generate the predicted frame using the fused feature information; based on the predicted frame and the second image, the frame discriminator included in the training initial model is trained ; In response to the end of training, determine the initial model after training as the anomaly detection model.
  • training the prediction frame generator included in the initial model includes: optimizing parameters of the feature extraction network based on a preset first loss function, wherein the first loss function includes At least one of the following: L2 distance loss, gradient constraint loss, and optical flow loss; optimize the parameters of the generation network based on a preset second loss function, where the second loss function includes least squares loss.
  • training the frame discriminator included in the initial model includes: superimposing a preset number of image frames and the predicted frame before the second image into a multi-channel image; extracting the multi-channel image feature information of the image; perform optical flow estimation on the feature information of the multi-channel image to determine the optical flow loss between the predicted frame and the second image; optimize the parameters of the frame discriminator based on the optical flow loss.
  • the number of first images is at least two.
  • the method further includes: acquiring multiple anomaly detection models obtained through multiple trainings; determining the detection performance of the multiple anomaly detection models, and determining the anomaly detection model with the best detection performance as the abnormal event detection model model used.
  • an embodiment of the present application provides a method for detecting an abnormal event.
  • the apparatus includes: acquiring a sequence of image frames collected by an image acquisition device, wherein the sequence of image frames includes a first image and a second image, and the second image is the next frame of the first image; inputting the first image into a prediction frame generator included in a pre-trained anomaly detection model to obtain a predicted frame, wherein the anomaly detection model is pre-trained based on the method described in the first aspect above; Input the predicted frame and the second image into a pre-trained frame discriminator to obtain a numerical value representing the degree of similarity between the predicted frame and the second image; in response to determining that the numerical value is less than or equal to a preset threshold, output the corresponding value representing the second image.
  • an embodiment of the present application provides a device for generating an abnormality detection model
  • the device includes: a first acquisition module, configured to acquire a plurality of sample image frame sequences, wherein each sample image frame sequence includes a first image and a The second image, the second image is the next frame of the first image; the first training module is used to train the prediction frame generator included in the initial model based on the first image and the second image, wherein the prediction The frame generator includes a multi-level feature extraction network and a generation network, the feature extraction network is used to extract the feature information of different depths of the first image and fuse the feature information, and the generation network is used to generate a prediction frame by using the fused feature information; the second The training module is used for training the frame discriminator included in the initial model based on the predicted frame and the second image; the first determination module is used for determining the trained initial model as an abnormality detection model in response to the end of training.
  • an embodiment of the present application provides an abnormal event detection apparatus, the apparatus includes: a third acquisition module, configured to acquire a sequence of image frames collected by an image acquisition device, wherein the sequence of image frames includes a first image and a first image Two images, the second image is the next frame of the first image; the prediction module is used to input the first image into the prediction frame generator included in the pre-trained anomaly detection model to obtain a predicted frame, wherein the abnormal The detection model is pre-trained based on the method described in the first aspect; the discrimination module is used to input the predicted frame and the second image into a pre-trained frame discriminator to obtain a numerical value representing the degree of similarity between the predicted frame and the second image; The output module is configured to, in response to determining that the value is less than or equal to a preset threshold, output information representing that an abnormal event occurs at a time point corresponding to the second image.
  • a third acquisition module configured to acquire a sequence of image frames collected by an image acquisition device, wherein the sequence of image frames includes a first
  • embodiments of the present application provide an electronic device, including: one or more processors; a storage device for storing one or more programs, when the one or more programs are executed by the one or more processors , causing one or more processors to implement the method as described in any one of the first aspect or the second aspect.
  • an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the method described in any implementation manner of the first aspect or the second aspect .
  • the method and device for generating an abnormality detection model, and the method and device for detecting abnormal events provided by the embodiments of the present application, by training the prediction frame generator included in the initial model based on the first image and the second image included in the obtained sample image frame sequence, predicting The frame generator generates the predicted frame, trains the frame discriminator included in the initial model based on the predicted frame and the second image, and finally determines the initial model after training as the anomaly detection model, because the frame generator uses features that combine a variety of different depths Informative approach, can generate predicted frames that are closer to reality, thereby improving the accuracy of anomaly detection.
  • FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
  • FIG. 2 is a flowchart of an embodiment of an anomaly detection model generation method according to the present application.
  • FIG. 3 is a schematic structural diagram of an initial model according to the anomaly detection model generation method of the present application.
  • FIG. 4 is a flowchart of another embodiment of the method for generating an anomaly detection model according to the present application.
  • FIG. 5 is a flowchart of an embodiment of an abnormal event detection method according to the present application.
  • FIG. 6 is a schematic structural diagram of an embodiment of an anomaly detection model generating apparatus according to the present application.
  • FIG. 7 is a flowchart of an embodiment of an abnormal event detection apparatus according to the present application.
  • FIG. 8 is a schematic structural diagram of a computer system suitable for implementing the electronic device according to the embodiment of the present application.
  • FIG. 1 shows an exemplary system architecture 100 to which an anomaly detection model generation method according to an embodiment of the present application may be applied.
  • the system architecture 100 may include a terminal device 101 , a network 102 and a server 103 .
  • the network 102 is a medium used to provide a communication link between the terminal device 101 and the server 103 .
  • the network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
  • the user can use the terminal device 101 to interact with the server 103 through the network 102 to receive or send messages and the like.
  • Various communication client applications such as monitoring applications, image processing applications, video processing applications, etc., may be installed on the terminal device 101 .
  • the terminal device 101 may be various electronic devices, including but not limited to, such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), in-vehicle terminals ( For example, mobile terminals such as car navigation terminals) and the like, and stationary terminals such as digital TVs, desktop computers, and the like.
  • PDAs personal digital assistants
  • PADs tablets
  • PMPs portable multimedia players
  • in-vehicle terminals For example, mobile terminals such as car navigation terminals) and the like, and stationary terminals such as digital TVs, desktop computers, and the like.
  • the server 103 may be a server that provides various services, such as an image processing server that processes the sequence of image frames uploaded by the terminal device 101 .
  • the image processing server can perform model training, anomaly detection and other processing on the received image frame sequence, and obtain processing results (such as anomaly detection model, anomaly detection information, etc.).
  • the abnormality detection model generation method or the abnormal event monitoring method provided in the embodiment of the present application may be executed by the terminal device 101 or the server 103 , and accordingly, the abnormality detection model generation device or the abnormal event monitoring device may be set on the terminal device. 101 or server 103.
  • terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs. It should be noted that, in the case where the samples for training the model or the images used for anomaly detection do not need to be obtained remotely, the above-mentioned system architecture may not include a network, but only a server or a terminal device.
  • FIG. 2 shows a flow 200 of an embodiment of the method for generating an anomaly detection model according to the present application.
  • the method includes the following steps:
  • Step 201 acquiring multiple sample image frame sequences.
  • the execution body of the method for generating an anomaly detection model may acquire multiple sample image frame sequences locally or remotely.
  • each sequence of sample image frames may be image frames included in video clips cut from different videos.
  • the above-mentioned multiple sample image sequences can come from preset datasets, such as UCSD-Ped2 or CUHK datasets.
  • each sample image frame sequence includes a first image and a second image, and the second image is the next frame of the first image.
  • the number of the first images can be set arbitrarily, for example, three.
  • the number of the first images is at least two.
  • the first image may include F t-1 , F t-2 , . . . , F tn , and the second image is F t .
  • the image frames included in the sample image sequence may be color images of a fixed size obtained by scaling the original image, for example, 256 ⁇ 256 ⁇ 3, where 3 is the number of color channels.
  • Step 202 based on the first image and the second image, train a prediction frame generator included in the initial model.
  • the above-mentioned execution body may train the prediction frame generator included in the initial model based on the first image and the second image.
  • the prediction frame generator includes a multi-level feature extraction network and a generation network, the feature extraction network is used to extract the feature information of different depths of the first image and fuse the feature information, and the generation network is used to generate the prediction frame by using the fused feature information.
  • the fused feature information can be a feature map.
  • a feature extraction network can include 20 convolutional layers (using 1x1 convolution and 3x3 convolution), 4 max pooling layers and 1 activation layer.
  • Multi-layer convolution calculation is used to extract the feature information of different depths in the first image (that is, the normal behavior image), and these feature information are fused.
  • the convolution calculation and the use of the Tanh activation function result in a 256 ⁇ 256 ⁇ 3 image, which is the predicted frame image.
  • p 1 , p 2 , p 3 , and p 4 are the first image
  • p t+1 is the second image.
  • 301 is the initial network.
  • the first image is input to the initial network through multiple convolutions to obtain the fused feature information, and then through three Conv(3,3) convolution operations, and through the Tanh activation function, the predicted frame is output.
  • the predicted frame can be compared with the corresponding second image, the loss value representing the gap between the predicted frame and the second image can be determined by using a preset loss function, and the feature extraction network and generation The parameters of the network make the predicted frame close to the second image, and when the training end conditions are met (for example, the loss value converges, the training duration reaches the preset duration, the number of training times reaches the preset number, etc.), the training ends.
  • 302 represents three loss functions, by comparing p t+1 and The parameters of the initial model are optimized to minimize the loss values of the three loss functions.
  • step 202 may include the following steps:
  • Step 1 based on the preset first loss function, optimize the parameters of the feature extraction network.
  • the first loss function includes at least one of the following: L2 distance loss, gradient constraint loss, and optical flow loss.
  • Step 2 Based on a preset second loss function, the parameters of the generation network are optimized, wherein the second loss function includes a least squares loss.
  • the above loss functions can be added to obtain the sum of the loss values, and the network parameters can be optimized by using the sum of the loss values.
  • the above-mentioned optical flow loss solves the problem of motion detection of objects under complex lighting conditions, and can learn the potential laws of normal behavior characteristics to the greatest extent.
  • the network parameters can be optimized from various aspects in this implementation manner, which is helpful to improve the accuracy of the prediction frame generated by the frame generator obtained by training.
  • Step 203 based on the predicted frame and the second image, train a frame discriminator included in the initial model.
  • the above-mentioned executive body may train the frame discriminator included in the initial model based on the predicted frame and the second image.
  • the frame discriminator is used to discriminate whether the two input images are the same.
  • Frame discriminators are usually trained based on convolutional neural networks. During training, the predicted frame and the actual frame (ie, the second image) are used as input, the annotation information used to distinguish the predicted frame and the actual frame is used as the expected output, and the frame discriminator is trained by using the machine learning method. The training goal is that the time frame discriminator has the highest discriminative accuracy.
  • the predicted frame generator and frame discriminator are trained alternately. For example, the parameters of the frame discriminator are fixed first, and the parameters of the predicted frame generator are optimized until the frame discriminator cannot correctly discriminate between the predicted frame and the actual frame. Then the parameters of the predicted frame generator are fixed, and the parameters of the frame discriminator are optimized until the frame discriminator can accurately discriminate between the predicted frame and the actual frame.
  • D is the frame discriminator, which combines p t+1 and Enter D to get information indicating whether the current frame is normal or abnormal.
  • step 203 may be performed as follows:
  • Step 2031 superimposing a preset number of image frames and predicted frames before the second image into a multi-channel image.
  • the preset number is 5, the first 5 frames of images are superimposed into multi-channel image data.
  • the multi-channel image can also be cropped to meet the input requirements of the subsequent neural network. For example, crop to a 512 ⁇ 384 size image.
  • Step 2032 extract feature information of the multi-channel image.
  • a neural network model can be used to extract feature information of a multi-channel image.
  • a Flownet (optical flow neural network) model can be used for feature extraction.
  • Flownet may include 12 3x3 convolutional layers for feature extraction on the input image.
  • Flownet can perform optical flow estimation on the input image, and the obtained feature information can reflect the relationship between adjacent multi-frame images.
  • Step 2033 Perform optical flow estimation on the feature information of the multi-channel image to determine the optical flow loss between the predicted frame and the second image.
  • the method of optical flow estimation can be an existing method.
  • the above-mentioned Flownet model can be used to perform optical flow estimation on the extracted feature information, and use a preset optical flow loss function to determine the optical flow loss.
  • Step 2034 optimize the parameters of the frame discriminator based on the optical flow loss.
  • multiple pairs of image data input models can be used repeatedly, and the parameters of the Flownet model can be iteratively optimized to minimize the loss value of optical flow loss. It should be understood that the optical flow estimation and the optical flow loss are the current prior art, and the specific implementation manner will not be repeated here.
  • This implementation method optimizes the frame discriminator by stacking multiple frames of images and using the optical flow estimation method, which can accurately reflect the movement of the object under complex lighting conditions and improve the discrimination accuracy.
  • Step 204 in response to the end of the training, determine the initial model after training as an anomaly detection model.
  • the above-mentioned execution body may determine the initial model after training as the abnormality detection model in response to the end of the training.
  • the training end condition may include, but is not limited to, at least one of the following: the loss value of the loss function converges, the number of training times reaches a preset number of times, and the training duration reaches a preset duration, and the like.
  • the resulting anomaly detection model consists of a trained predicted frame generator and frame discriminator.
  • the method may further include the following steps:
  • the method for training the above-mentioned multiple anomaly detection models is the same as the above-mentioned steps 201 to 204 .
  • the detection performance of the multiple anomaly detection models is determined, and the anomaly detection model with the best detection performance is determined as the model used for abnormal event detection.
  • the performance of the anomaly detection model can be characterized by various indicators, such as at least one of the following: detection accuracy (that is, the higher the accuracy, the better the performance), detection time (that is, under the condition of ensuring the accuracy, the length of a single detection The shorter, the better the performance) and so on.
  • an anomaly detection model with better performance can be obtained by performing performance screening on multiple anomaly detection models obtained by multiple trainings.
  • the prediction frame generator included in the initial model is trained based on the first image and the second image included in the obtained sample image frame sequence, and the prediction frame generator generates a prediction frame, based on the prediction frame and For the second image, train the frame discriminator included in the initial model, and finally determine the initial model after training as the anomaly detection model. Since the frame generator adopts the method of fusing feature information of various depths, the generated predicted frame can be more accurate. close to reality, thereby improving the accuracy of anomaly detection.
  • a flow 500 of one embodiment of an abnormal event detection method according to the present application is shown.
  • the method includes the following steps:
  • Step 501 Acquire a sequence of image frames collected by an image collection device.
  • the above-mentioned execution subject may acquire the image frame sequence acquired by the image acquisition device locally or remotely.
  • the image acquisition device may be a device such as a camera included in the above-mentioned execution body, or may be a device such as a camera included in other devices communicatively connected to the above-mentioned execution body.
  • the image frame sequence may be an image frame sequence included in a video collected in real time, or may be an image frame sequence included in a pre-stored video file.
  • the image frame sequence includes a first image and a second image, and the second image is the next frame of the first image.
  • the definitions of the first image and the second image are basically the same as the above step 201, and details are not repeated here.
  • Step 502 Input the first image into a prediction frame generator included in the pre-trained anomaly detection model to obtain a prediction frame.
  • the above-mentioned execution subject may input the first image into the prediction frame generator included in the pre-trained anomaly detection model to obtain the prediction frame.
  • the anomaly detection model is pre-trained based on the method described in the above-mentioned embodiment corresponding to FIG. 2 .
  • the prediction frame generator reference may be made to the description in the above-mentioned embodiment corresponding to FIG. 2 .
  • Step 503 Input the predicted frame and the second image into a pre-trained frame discriminator, and obtain a numerical value representing the degree of similarity between the predicted frame and the second image.
  • the above-mentioned execution subject may input the predicted frame and the second image into a pre-trained frame discriminator, and obtain a numerical value representing the degree of similarity between the predicted frame and the second image.
  • a numerical value representing the degree of similarity For the description of the frame discriminator, reference may be made to the description in the corresponding embodiment of FIG. 2 above.
  • the above numerical value representing the similarity degree can be calculated by various methods, such as determining the cosine distance, Euclidean distance, etc. between the images to calculate the similarity degree.
  • Step 504 in response to determining that the value is less than or equal to a preset threshold, output information representing that an abnormal event occurs at a time point corresponding to the second image.
  • the execution subject in response to determining that the value is less than or equal to a preset threshold, may output information representing that an abnormal event occurs at a time point corresponding to the second image.
  • the above-mentioned numerical value when the above-mentioned numerical value is less than or equal to the preset threshold, it indicates that there is a large gap between the predicted image frame and the actual image frame. At this time, an abnormal situation may occur within the shooting range of the camera, and various forms of information are further output. To alert the user that an abnormal situation has occurred.
  • the above-mentioned information representing the occurrence of an abnormal event may include, but is not limited to, information in at least one of the following forms: text, image, alarm sound, and the like.
  • the abnormal event detection method provided by the above-mentioned embodiments of the present application by using the abnormality detection model trained in the above-mentioned embodiment corresponding to FIG. 2 , can output information indicating that an abnormal phenomenon has occurred when the predicted frame is greatly different from the actual frame, so that it is possible to Monitor abnormal behavior efficiently and accurately.
  • the present application provides an embodiment of an anomaly detection model generation apparatus, the apparatus embodiment corresponds to the method embodiment shown in FIG. 2 , the apparatus Specifically, it can be applied to various electronic devices.
  • the anomaly detection model generation apparatus 600 in this embodiment includes: a first acquisition module 601, configured to acquire a plurality of sample image frame sequences, wherein each sample image frame sequence includes a first image and a second image , the second image is the next frame of the first image; the first training module 602 is used to train the prediction frame generator included in the initial model based on the first image and the second image, wherein the prediction frame generator includes a multi-level The feature extraction network and the generation network, the feature extraction network is used to extract the feature information of different depths of the first image and fuse the feature information, and the generation network is used to generate the prediction frame by using the fused feature information; the second training module 603 is used for Based on the predicted frame and the second image, the frame discriminator included in the training initial model is trained; the first determination module 604 is configured to determine the trained initial model as an anomaly detection model in response to the end of training
  • the first acquisition module 601 may acquire multiple sample image frame sequences locally or remotely.
  • each sequence of sample image frames may be image frames included in video clips cut from different videos.
  • the above-mentioned multiple sample image sequences can come from preset datasets, such as UCSD-Ped2 or CUHK datasets.
  • each sample image frame sequence includes a first image and a second image, and the second image is the next frame of the first image.
  • the number of the first images can be set arbitrarily, for example, three.
  • the first training module 602 may train the prediction frame generator included in the initial model based on the first image and the second image.
  • the prediction frame generator includes a multi-level feature extraction network and a generation network, the feature extraction network is used to extract the feature information of different depths of the first image and fuse the feature information, and the generation network is used to generate the prediction frame by using the fused feature information.
  • the fused feature information can be a feature map.
  • a feature extraction network can include 20 convolutional layers (using 1x1 convolution and 3x3 convolution), 4 max pooling layers and 1 activation layer.
  • Multi-layer convolution calculation is used to extract the feature information of different depths in the first image (that is, the normal behavior image), and these feature information are fused.
  • the convolution calculation and the use of the Tanh activation function result in a 256 ⁇ 256 ⁇ 3 image, which is the predicted frame image.
  • p 1 , p 2 , p 3 , and p 4 are the first image
  • p t+1 is the second image.
  • 301 is the initial network, the first image is input to the initial network after multiple convolutions to obtain the fused feature information, and then through three Conv(3,3) convolution operations, and through the Tanh activation function, the predicted frame is output.
  • the predicted frame can be compared with the corresponding second image, the loss value representing the gap between the predicted frame and the second image can be determined by using a preset loss function, and the feature extraction network and generation The parameters of the network make the predicted frame close to the second image, and when the training end conditions are met (for example, the loss value converges, the training duration reaches the preset duration, the number of training times reaches the preset number, etc.), the training ends.
  • 302 represents three loss functions, by comparing p t+1 and The parameters of the initial model are optimized to minimize the loss values of the three loss functions.
  • the second training module 603 may train the frame discriminator included in the initial model based on the predicted frame and the second image.
  • the frame discriminator is used to discriminate whether the two input images are the same.
  • Frame discriminators are usually trained based on convolutional neural networks. During training, the predicted frame and the actual frame (ie, the second image) are used as input, the annotation information used to distinguish the predicted frame and the actual frame is used as the expected output, and the frame discriminator is trained by using the machine learning method. The training goal is that the time frame discriminator has the highest discriminative accuracy.
  • the predicted frame generator and frame discriminator are trained alternately. For example, the parameters of the frame discriminator are fixed first, and the parameters of the predicted frame generator are optimized until the frame discriminator cannot correctly discriminate between the predicted frame and the actual frame. Then the parameters of the predicted frame generator are fixed, and the parameters of the frame discriminator are optimized until the frame discriminator can accurately discriminate between the predicted frame and the actual frame.
  • D is the frame discriminator, which combines p t+1 and Enter D to get information indicating whether the current frame is normal or abnormal.
  • the first determination module 604 may determine the trained initial model as the abnormality detection model in response to the end of the training.
  • the training end conditions may include but are not limited to at least one of the following: the loss value of the loss function converges, the number of training times reaches a preset number of times, and the training duration reaches a preset duration, etc.
  • the resulting anomaly detection model consists of a trained predicted frame generator and frame discriminator.
  • the first training module may include: a first optimization unit (not shown in the figure), configured to optimize the parameters of the feature extraction network based on the preset first loss function,
  • the first loss function includes at least one of the following: L2 distance loss, gradient constraint loss, and optical flow loss; and a second optimization unit (not shown in the figure), configured to optimize the generation network based on a preset second loss function , where the second loss function includes a least squares loss.
  • the second training module 603 may include: a superimposing unit (not shown in the figure), configured to superimpose a preset number of image frames before the second image and the predicted frame is a multi-channel image; an extraction unit (not shown in the figure) is used to extract the feature information of the multi-channel image; an estimation unit (not shown in the figure) is used to perform optical flow estimation on the feature information of the multi-channel image to determine The optical flow loss between the predicted frame and the second image; the third optimization unit (not shown in the figure) is used to optimize the parameters of the frame discriminator based on the optical flow loss.
  • a superimposing unit (not shown in the figure), configured to superimpose a preset number of image frames before the second image and the predicted frame is a multi-channel image
  • an extraction unit (not shown in the figure) is used to extract the feature information of the multi-channel image
  • an estimation unit (not shown in the figure) is used to perform optical flow estimation on the feature information of the multi-channel image to determine The optical flow loss between the predicted frame and the second image
  • the number of the first images is at least two.
  • the apparatus 600 may further include: a second acquisition module (not shown in the figure), configured to acquire multiple anomaly detection models obtained through multiple trainings; a second determination module (not shown in the figure), used to determine the detection performance of multiple anomaly detection models, and determine the anomaly detection model with the best detection performance as the model used for abnormal event detection.
  • a second acquisition module (not shown in the figure), configured to acquire multiple anomaly detection models obtained through multiple trainings
  • a second determination module (not shown in the figure), used to determine the detection performance of multiple anomaly detection models, and determine the anomaly detection model with the best detection performance as the model used for abnormal event detection.
  • the predicted frame generator included in the initial model is trained based on the first image and the second image included in the obtained sample image frame sequence, and the predicted frame generator generates a predicted frame, based on the predicted frame and For the second image, train the frame discriminator included in the initial model, and finally determine the initial model after training as the anomaly detection model. Since the frame generator adopts the method of fusing feature information of various depths, the generated predicted frame can be more accurate. close to reality, thereby improving the accuracy of anomaly detection.
  • the present application provides an embodiment of an abnormality detection model generation apparatus
  • the apparatus embodiment corresponds to the method embodiment shown in FIG. 5
  • the apparatus 700 for generating an anomaly detection model in this embodiment includes: a third acquiring module 701, configured to acquire a sequence of image frames acquired by an image acquisition device, wherein the sequence of image frames includes a first image and a second image , the second image is the next frame of the first image; the prediction module 702 is configured to input the first image into the prediction frame generator included in the pre-trained anomaly detection model to obtain a predicted frame, wherein the anomaly detection model is pre-based on the above The method described in the first aspect is obtained by training; the discrimination module 703 is used to input the predicted frame and the second image into a pre-trained frame discriminator to obtain a numerical value representing the degree of similarity between the predicted frame and the second image; the output module 704, In response to determining that the value is less than or equal to a preset threshold, outputting information representing the occurrence of an abnormal event at a time point corresponding to the second image
  • the third acquisition module 701 may acquire the image frame sequence acquired by the image acquisition device locally or remotely.
  • the image acquisition device may be a device such as a camera included in the foregoing apparatus 700 , or may be a device such as a camera included in other devices communicatively connected to the foregoing apparatus 700 .
  • the image frame sequence may be an image frame sequence included in a video collected in real time, or may be an image frame sequence included in a pre-stored video file.
  • the image frame sequence includes a first image and a second image, and the second image is the next frame of the first image.
  • the definitions of the first image and the second image are basically the same as the above step 201, and details are not repeated here.
  • the prediction module 702 may input the first image into the prediction frame generator included in the pre-trained anomaly detection model to obtain the prediction frame.
  • the anomaly detection model is pre-trained based on the method described in the above-mentioned embodiment corresponding to FIG. 2 .
  • the prediction frame generator reference may be made to the description in the above-mentioned embodiment corresponding to FIG. 2 .
  • the discrimination module 703 may input the predicted frame and the second image into a pre-trained frame discriminator to obtain a numerical value representing the degree of similarity between the predicted frame and the second image.
  • a numerical value representing the degree of similarity For the description of the frame discriminator, reference may be made to the description in the corresponding embodiment of FIG. 2 above.
  • the above numerical value representing the similarity degree can be calculated by various methods, such as determining the cosine distance, Euclidean distance, etc. between the images to calculate the similarity degree.
  • the output module 704 may, in response to determining that the value is less than or equal to a preset threshold, output information representing that an abnormal event occurs at a time point corresponding to the second image.
  • the above-mentioned numerical value when the above-mentioned numerical value is less than or equal to the preset threshold, it indicates that there is a large gap between the predicted image frame and the actual image frame. At this time, an abnormal situation may occur within the shooting range of the camera, and various forms of information are further output. To alert the user that an abnormal situation has occurred.
  • the above-mentioned information representing the occurrence of an abnormal event may include, but is not limited to, information in at least one of the following forms: text, image, alarm sound, and the like.
  • the device provided by the above-mentioned embodiment of the present application by using the abnormality detection model trained in the above-mentioned embodiment corresponding to FIG. 2 , can output the information indicating the occurrence of abnormal phenomenon when the predicted frame is greatly different from the actual frame, so that it can be efficiently and accurately Monitor abnormal behavior.
  • FIG. 8 shows a schematic structural diagram of a computer system 800 suitable for implementing the electronic device of the embodiment of the present application.
  • the electronic device shown in FIG. 8 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.
  • a computer system 800 includes a central processing unit (CPU) 801, which can be loaded into a random access memory (RAM) 803 according to a program stored in a read only memory (ROM) 802 or a program from a storage section 808 Instead, various appropriate actions and processes are performed.
  • RAM random access memory
  • ROM read only memory
  • various programs and data required for the operation of the system 800 are also stored.
  • the CPU 801, the ROM 802, and the RAM 803 are connected to each other through a bus 804.
  • An input/output (I/O) interface 805 is also connected to bus 804 .
  • the following components are connected to the I/O interface 805: an input section 806 including a keyboard, a mouse, etc.; an output section 807 including a liquid crystal display (LCD), etc. and a speaker, etc.; a storage section 808 including a hard disk, etc.; Communication section 809 of a network interface card such as a modem.
  • the communication section 809 performs communication processing via a network such as the Internet.
  • a drive 810 is also connected to the I/O interface 805 as needed.
  • a removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 810 as needed so that a computer program read therefrom is installed into the storage section 808 as needed.
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via the communication portion 809, and/or installed from the removable medium 811.
  • CPU central processing unit
  • the computer-readable storage medium described in this application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable storage medium other than a computer-readable storage medium that can be sent, propagated, or transmitted for use by or in connection with the instruction execution system, apparatus, or device program of.
  • Program code embodied on a computer-readable storage medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for performing the operations of the present application may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, but also conventional procedural programming language - such as "C" language or similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider e.g., using an Internet service provider through Internet connection.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the modules involved in the embodiments of the present application may be implemented in a software manner, and may also be implemented in a hardware manner.
  • the described modules can also be set in the processor, for example, it can be described as: a processor includes a first acquisition module, a first training module, a second training module and a first determination module.
  • a processor includes a first acquisition module, a first training module, a second training module and a first determination module.
  • the names of these modules do not constitute a limitation of the unit itself under certain circumstances, for example, the first acquisition module can also be described as "a module for acquiring multiple sample image frame sequences".
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be included in the electronic device described in the above-mentioned embodiments; in electronic equipment.
  • the above-mentioned computer-readable storage medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: acquires a plurality of sample image frame sequences, wherein each sample image frame sequence includes The first image and the second image, where the second image is the next frame of the first image; based on the first image and the second image, the prediction frame generator included in the training initial model, wherein the prediction frame generator includes a multi-level Feature extraction network and generation network, the feature extraction network is used to extract the feature information of different depths of the first image and fuse the feature information, and the generation network is used to generate the predicted frame by using the fused feature information; based on the predicted frame and the second image, training The frame discriminator included in the initial model; in response to the end of training, the trained initial model is determined as
  • the electronic device can also be caused to: acquire a sequence of image frames acquired by the image acquisition device, wherein the sequence of image frames includes a first image and a second image, and the second image The image is the next frame of the first image; the first image is input into the prediction frame generator included in the pre-trained anomaly detection model to obtain the predicted frame, wherein the anomaly detection model is pre-trained based on the method described in the first aspect above; Input the predicted frame and the second image into a pre-trained frame discriminator to obtain a numerical value representing the degree of similarity between the predicted frame and the second image; in response to determining that the numerical value is less than or equal to a preset threshold, output the corresponding value representing the second image.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

公开了异常检测模型生成方法和装置。该方法的一具体实施方式包括:获取多个样本图像帧序列;基于第一图像和第二图像,训练初始模型包括的预测帧生成器,其中,预测帧生成器包括多层次的特征提取网络和生成网络,特征提取网络用于提取第一图像的不同深度的特征信息并融合特征信息,生成网络用于利用融合后的特征信息生成预测帧;基于预测帧和第二图像,训练初始模型包括的帧判别器;响应于训练结束,将训练后的初始模型确定为异常检测模型。该实施方式采用了融合多种不同深度的特征信息的方法,可以使生成的预测帧更接近实际,从而提高了异常检测的准确性。

Description

异常检测模型生成方法和装置、异常事件检测方法和装置
相关申请
本申请要求保护在2020年12月2日提交的申请号为202011405894.4的中国专利申请的优先权,该申请的全部内容以引用的方式结合到本文中。
技术领域
本申请实施例涉及计算机技术领域,具体涉及异常检测模型生成方法和装置、异常事件检测方法和装置。
背景技术
异常检测(Anomaly Detection)问题是机器学习算法的一个常见应用。让一个系统从许多未标注的数据中学习到某些正常的特征,从而能够诊断出非正常的数据,我们把这个过程叫做异常检测。所谓异常检测就是发现与大部分对象不同的对象,其实就是发现离群点。异常检测有不同领域的定义,视频中的异常检测是指识别与预期行为不符的事件,区别正常事件和异常事件。
现在的异常检测方法,使用正常训练数据进行特征重建是一种常用的策略。然而,几乎所有现有方法都通过最小化训练数据的重构误差来解决该问题,这不能保证异常事件中较大的重构误差。根据现有方法可以将特征重建的方法大致分为基于手动设计特征的方法以及基于深度学习的方法。在使用手动设计特征的方法时,由于字典没有经过异常事件的训练,而且通常不完整,所以并不能保证结果的准确性。而使用基于深度学习的方法也会出现一些问题,深度神经网络的容量很高,并且不一定会发生针对异常事件的较大重构错误,导致最终的异常检测达不到准确的结果。
公开内容
本申请实施例的目的在于提出了一种改进的异常检测模型生成方法和装置,来解决以上背景技术部分提到的技术问题。
第一方面,本申请实施例提供了一种异常检测模型生成方法,该方法包括:获取多个样本图像帧序列,其中,每个样本图像帧序列包括第一图像和第二图像,所述第二图像为所述第一图像的下一帧图像;基于第一图像和第二图像,训练初始模型包括的预测帧生成器,其中,预测帧生成器包括多层次的特征提取网络和生成网络,特征提取网络用于提取第一图像的不同深度的特征信息并融合特征信息,生成网络用于利用融合后的特征信息生成预测帧;基于预测帧和第二图像,训练初始模型包括的帧判别器;响应于训练结束,将训练后的初始模型确定为异常检测模型。
在一些实施例中,基于第一图像和第二图像,训练初始模型包括的预测帧生成器,包括:基于预设的第一损失函数,优化特征提取网络的参数,其中,第一损失函数包括以下至少一种:L2距离损失、梯度约束损失、光流损失;基于预设的第二损失函数,优化生成网络的参数,其 中,第二损失函数包括最小二乘损失。
在一些实施例中,基于预测帧和第二图像,训练初始模型包括的帧判别器,包括:将位于第二图像之前的预设数量个图像帧与预测帧叠加为多通道图像;提取多通道图像的特征信息;对多通道图像的特征信息进行光流估计以确定预测帧与第二图像之间的光流损失;基于光流损失,对帧判别器的参数进行优化。
在一些实施例中,第一图像的数量为至少两个。
在一些实施例中,该方法还包括:获取经过多次训练得到的多个异常检测模型;确定多个异常检测模型的检测性能,并将检测性能最优的异常检测模型确定为进行异常事件检测所用的模型。
第二方面,本申请实施例提供了一种异常事件检测方法,该装置包括:获取由图像采集设备采集的图像帧序列,其中,图像帧序列包括第一图像和第二图像,第二图像为所述第一图像的下一帧图像;将第一图像输入预先训练的异常检测模型包括的预测帧生成器,得到预测帧,其中,异常检测模型预先基于上述第一方面描述的方法训练得到;将预测帧和第二图像输入预先训练的帧判别器,得到表征预测帧和第二图像之间的相似程度的数值;响应于确定数值小于或等于预设的阈值,输出表征第二图像对应的时间点发生异常事件的信息。
第三方面,本申请实施例提供了一种异常检测模型生成装置,该装置包括:第一获取模块,用于获取多个样本图像帧序列,其中,每个样本图像帧序列包括第一图像和第二图像,所述第二图像为所述第一图像的下一帧图像;第一训练模块,用于基于第一图像和第二图像,训练初始模型包括的预测帧生成器,其中,预测帧生成器包括多层次的特征提取网络和生成网络,特征提取网络用于提取第一图像的不同深度的特征信息并融合特征信息,生成网络用于利用融合后的特征信息生成预测帧;第二训练模块,用于基于预测帧和第二图像,训练初始模型包括的帧判别器;第一确定模块,用于响应于训练结束,将训练后的初始模型确定为异常检测模型。
第四方面,本申请实施例提供了一种异常事件检测装置,该装置包括:第三获取模块,用于获取由图像采集设备采集的图像帧序列,其中,图像帧序列包括第一图像和第二图像,所述第二图像为所述第一图像的下一帧图像;预测模块,用于将第一图像输入预先训练的异常检测模型包括的预测帧生成器,得到预测帧,其中,异常检测模型预先基于上述第一方面描述的方法训练得到;判别模块,用于将预测帧和第二图像输入预先训练的帧判别器,得到表征预测帧和第二图像之间的相似程度的数值;输出模块,用于响应于确定数值小于或等于预设的阈值,输出表征第二图像对应的时间点发生异常事件的信息。
第五方面,本申请实施例提供了一种电子设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如第一方面或第二方面中任一实现方式描述的方法。
第六方面,本申请实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如第一方面或第二方面中任一实现方式描述的方法。
本申请实施例提供的异常检测模型生成方法和装置、异常事件检测 方法和装置,通过基于获取的样本图像帧序列包括的第一图像和第二图像,训练初始模型包括的预测帧生成器,预测帧生成器生成预测帧,基于预测帧和第二图像,训练初始模型包括的帧判别器,最后将训练结束的初始模型确定为异常检测模型,由于帧生成器采用了融合多种不同深度的特征信息的方法,可以是生成的预测帧更接近实际,从而提高了异常检测的准确性。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:
图1是本申请可以应用于其中的示例性系统架构图;
图2是根据本申请的异常检测模型生成方法的一个实施例的流程图;
图3是根据本申请的异常检测模型生成方法的初始模型的结构示意图;
图4是根据本申请的异常检测模型生成方法的另一个实施例的流程图;
图5是根据本申请的异常事件检测方法的一个实施例的流程图;
图6是根据本申请的异常检测模型生成装置的一个实施例的结构示意图;
图7是根据本申请的异常事件检测装置的一个实施例的流程图;
图8是适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关公开,而非对该公开的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关公开相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
图1示出了可以应用本申请实施例的异常检测模型生成方法的示例性系统架构100。
如图1所示,系统架构100可以包括终端设备101,网络102和服务器103。网络102用以在终端设备101和服务器103之间提供通信链路的介质。网络102可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101通过网络102与服务器103交互,以接收或发送消息等。终端设备101上可以安装有各种通讯客户端应用,例如监控类应用、图像处理应用、视频处理应用等。
终端设备101可以是各种电子设备,包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。
服务器103可以是提供各种服务的服务器,例如对终端设备101上传的图像帧序列进行处理的图像处理服务器。图像处理服务器可以对接 收的图像帧序列进行模型训练、异常检测等处理,并得到处理结果(例如异常检测模型、异常检测信息等)。
需要说明的是,本申请实施例所提供的异常检测模型生成方法或异常事件监测方法可以由终端设备101或服务器103执行,相应地,异常检测模型生成装置或异常事件监测装置可以设置于终端设备101或服务器103中。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。需要说明的是,在训练模型的样本或用于异常检测的图像不需从远程获取的情况下,上述系统架构可以不包括网络,只包括服务器或终端设备。
继续参考图2,其示出了根据本申请的应异常检测模型生成方法的一个实施例的流程200。该方法包括以下步骤:
步骤201,获取多个样本图像帧序列。
在本实施例中,异常检测模型生成方法的执行主体(例如图1所示的终端设备或服务器)可以从本地或从远程获取多个样本图像帧序列。其中,各个样本图像帧序列可以是从不同的视频中截取的视频片段包括的图像帧。通常上述多个样本图像序列可以来自预设的数据集,例如UCSD-Ped2或CUHK数据集。
其中,每个样本图像帧序列包括第一图像和第二图像,所述第二图像为所述第一图像的下一帧图像。其中,第一图像的数量可以任意设置,例如3个。
在本实施例的一些可选的实现方式中,第一图像的数量为至少两个。作为示例,对于某个样本图像帧序列,若最后一帧图像为F t,则第一图像可以包括F t-1、F t-2、…、F t-n,第二图像为F t。通过将第一图像的数量设置为至少两个,可以充分地利用历史帧的特征对未来帧进行预测,提高生成预测帧的准确性。
通常,样本图像序列包括的图像帧可以是对原图经过缩放而成的固定尺寸的彩色图像,例如256×256×3,3为颜色通道数。
步骤202,基于第一图像和第二图像,训练初始模型包括的预测帧生成器。
在本实施例中,上述执行主体可以基于第一图像和第二图像,训练初始模型包括的预测帧生成器。
其中,预测帧生成器包括多层次的特征提取网络和生成网络,特征提取网络用于提取第一图像的不同深度的特征信息并融合特征信息,生成网络用于利用融合后的特征信息生成预测帧。融合后的特征信息可以是特征图。
作为示例,特征提取网络可以包括20个卷积层(采用1x1卷积和3x3卷积),4个最大池化层和1个激活层。采用多层卷积计算提取出第一图像(即正常行为图像)中不同深度的特征信息,并融合这些特征信息,融合后的特征信息输入生成网络,生成网络可以对融合后的特征信息进行三次卷积计算以及使用Tanh激活函数,得到一张256×256×3的图像,该图像即为预测帧图像。如图3所示,p 1、p 2、p 3、p 4即为第一图像,p t+1为第二图像。301为初始网络,第一图像输入初始网络经过多次卷积得到融合后的特征信息,再经过三次Conv(3,3)的卷积运算,以及经过Tanh 激活函数,输出预测帧
Figure PCTCN2020139499-appb-000001
通常,在训练时,可以将预测帧与对应的第二图像进行比对,利用预设的损失函数确定表征预测帧与第二图像之间的差距的损失值,通过迭代优化特征提取网络和生成网络的参数,使预测帧接近第二图像,当满足训练结束条件时(例如损失值收敛、训练时长达到预设时长、训练次数达到预设次数等),结束训练。如图3所示,302表示三种损失函数,通过对比p t+1
Figure PCTCN2020139499-appb-000002
优化初始模型的参数以最小化三种损失函数的损失值。
在本实施例的一些可选的实现方式中,步骤202可以包括如下步骤:
步骤一,基于预设的第一损失函数,优化特征提取网络的参数。
其中,第一损失函数包括以下至少一种:L2距离损失、梯度约束损失、光流损失。
步骤二,基于预设的第二损失函数,优化生成网络的参数,其中,第二损失函数包括最小二乘损失。
通常,在可以将上述各损失函数相加,得到损失值之和,利用损失值之和优化网络参数。需要说明的是,采用上述光流损失,解决了物体在复杂光照条件下的运动检测问题,能最大程度学习到正常行为特征的潜在规律。本实现方式通过采用上述各种损失函数,可以从多个方面优化网络参数,有助于提高训练得到的帧生成器生成预测帧的准确性。
步骤203,基于预测帧和第二图像,训练初始模型包括的帧判别器。
在本实施例中,上述执行主体可以基于预测帧和第二图像,训练初始模型包括的帧判别器。
其中,帧判别器用于判别输入的两个图像是否相同。帧判别器通常基于卷积神经网络训练得到。在训练时,将预测帧和实际帧(即第二图像)作为输入,将用于区别预测帧和实际帧的标注信息作为期望输出,利用机器学习方法,训练帧判别器。训练的目标是时帧判别器的判别准确性最高。
通常,预测帧生成器和帧判别器是交替训练的。例如首先固定帧判别器的参数,优化预测帧生成器的参数,直到帧判别器无法正确判别预测帧和实际帧。然后固定预测帧生成器的参数,优化帧判别器的参数,直到帧判别器可以准确判别预测帧和实际帧。
如图3所示,D为帧判别器,将p t+1
Figure PCTCN2020139499-appb-000003
输入D,即可得到表示当前帧为正常(normal)或不正常(abnormal)的信息。
在本实施例的一些可选的实现方式中,如图4所示,步骤203可以如下执行:
步骤2031,将位于第二图像之前的预设数量个图像帧与预测帧叠加为多通道图像。
例如,若预设数量为5,则将前5帧图像叠加为多通道的图像数据。
可选的,在叠加为多通道图像后,还可以对多通道图像进行裁剪,以适应后续的神经网络对输入的要求。例如,裁剪为512×384大小的图像。
步骤2032,提取多通道图像的特征信息。
具体地,可以利用神经网络模型提取多通道图像的特征信息。例如可以使用Flownet(光流神经网络)模型进行特征提取。作为示例,Flownet可以包括12个3x3卷积层,用于对输入的图像进行特征提取。Flownet 可以对输入的图像进行光流估计,得到的特征信息可以反映相邻的多帧图像之间的关系。
步骤2033,对多通道图像的特征信息进行光流估计以确定预测帧与第二图像之间的光流损失。
其中,光流估计的方法可以是现有的方法。例如,可以采用上述Flownet模型,对提取的特征信息进行光流估计,并利用预设的光流损失函数确定光流损失。
步骤2034,基于光流损失,对帧判别器的参数进行优化。
具体地,可以反复地利用多对图像数据输入模型,迭代地优化Flownet模型的参数,使光流损失的损失值最小化。应当理解,光流估计和光流损失是目前的现有技术,这里不再对具体实现方式进行赘述。
本实现方式通过对多帧图像叠加,以及利用光流估计方法,对帧判别器进行优化,可以准确地反映物体运动在复杂光照条件下的运动,提高判别准确性。
步骤204,响应于训练结束,将训练后的初始模型确定为异常检测模型。
在本实施例中,上述执行主体可以响应于训练结束,将训练后的初始模型确定为异常检测模型。其中,训练结束条件可以包括但不限于以下至少一种:损失函数的损失值收敛、训练次数达到预设次数、训练时长达到预设时长等。最终得到的异常检测模型包括训练后的预测帧生成器和帧判别器。
在本实施例的一些可选的实现方式中,该方法还可以包括如下步骤:
首先,获取经过多次训练得到的多个异常检测模型。
其中,训练上述多个异常检测模型的方法与上述步骤201-步骤204相同。
然后,确定多个异常检测模型的检测性能,并将检测性能最优的异常检测模型确定为进行异常事件检测所用的模型。
其中,异常检测模型的性能可以用各种指标表征,例如以下至少一项:检测准确率(即准确率越高性能越好)、检测时长(即在保证准确率的情况下,单次检测时长越短,性能越好)等。
本实现方式通过对多次训练得到的多个异常检测模型进行性能筛选,可以得到性能更好的异常检测模型。
本申请的上述实施例提供的方法,通过基于获取的样本图像帧序列包括的第一图像和第二图像,训练初始模型包括的预测帧生成器,预测帧生成器生成预测帧,基于预测帧和第二图像,训练初始模型包括的帧判别器,最后将训练结束的初始模型确定为异常检测模型,由于帧生成器采用了融合多种不同深度的特征信息的方法,可以使生成的预测帧更接近实际,从而提高了异常检测的准确性。
进一步参考图5,其示出了根据本申请的异常事件检测方法的一个实施例的流程500。该方法包括以下步骤:
步骤501,获取由图像采集设备采集的图像帧序列。
在本实施例中,上述执行主体可以从本地或从远程获取由图像采集设备采集的图像帧序列。其中,图像采集设备可以为上述执行主体包括的摄像头等设备,也可以是与上述执行主体通信连接的其他设备包括的 摄像头等设备。图像帧序列可以是实时采集的视频中包括的图像帧序列,也可以是预先存储的视频文件包括的图像帧序列。
其中,图像帧序列包括第一图像和第二图像,第二图像为所述第一图像的下一帧图像。其中,第一图像和第二图像的定义与上述步骤201基本一致,这里不再赘述。
步骤502,将第一图像输入预先训练的异常检测模型包括的预测帧生成器,得到预测帧。
在本实施例中,上述执行主体可以将第一图像输入预先训练的异常检测模型包括的预测帧生成器,得到预测帧。其中,异常检测模型预先基于上述图2对应实施例描述的方法训练得到,关于预测帧生成器的描述,可以参见上述图2对应实施例中的描述。
步骤503,将预测帧和第二图像输入预先训练的帧判别器,得到表征预测帧和第二图像之间的相似程度的数值。
在本实施例中,上述执行主体可以将预测帧和第二图像输入预先训练的帧判别器,得到表征预测帧和第二图像之间的相似程度的数值。其中,关于帧判别器的描述,可以参见上述图2对应实施例中的描述。上述表征相似程度的数值越大,表示两个图像的相似程度越高。上述表征相似程度的数值可以通过各种方法计算得到,例如确定图像之间的余弦距离、欧氏距离等计算相似度。
步骤504,响应于确定数值小于或等于预设的阈值,输出表征第二图像对应的时间点发生异常事件的信息。
在本实施例中,上述执行主体可以响应于确定数值小于或等于预设的阈值,输出表征第二图像对应的时间点发生异常事件的信息。
具体地,当上述数值小于或等于预设的阈值时,表征预测的图像帧与实际的图像帧的差距较大,此时摄像头的拍摄范围内可能发生了异常情况,进一步输出各种形式的信息以提示用户当前发生了异常情况。上述表征发生异常事件的信息可以包括但不限于以下至少一种形式的信息:文字、图像、警报音等。
本申请的上述实施例提供的异常事件检测方法,通过使用上述图2对应实施例中训练的异常检测模型,可以在预测帧与实际帧相差较大时输出表征发生了异常现象的信息,从而可以高效、准确地对异常行为进行监控。
进一步参考图6,作为对上述各图所示方法的实现,本申请提供了一种异常检测模型生成装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图6所示,本实施例的异常检测模型生成装置600包括:第一获取模块601,用于获取多个样本图像帧序列,其中,每个样本图像帧序列包括第一图像和第二图像,第二图像为第一图像的下一帧图像;第一训练模块602,用于基于第一图像和第二图像,训练初始模型包括的预测帧生成器,其中,预测帧生成器包括多层次的特征提取网络和生成网络,特征提取网络用于提取第一图像的不同深度的特征信息并融合特征信息,生成网络用于利用融合后的特征信息生成预测帧;第二训练模块603,用于基于预测帧和第二图像,训练初始模型包括的帧判别器;第一确定模块604,用于响应于训练结束,将训练后的初始模型确定为异常检测模 型
在本实施例中,第一获取模块601可以从本地或从远程获取多个样本图像帧序列。其中,各个样本图像帧序列可以是从不同的视频中截取的视频片段包括的图像帧。通常上述多个样本图像序列可以来自预设的数据集,例如UCSD-Ped2或CUHK数据集。
其中,每个样本图像帧序列包括第一图像和第二图像,第二图像为第一图像的下一帧图像。其中,第一图像的数量可以任意设置,例如3个。
在本实施例中,第一训练模块602可以基于第一图像和第二图像,训练初始模型包括的预测帧生成器。
其中,预测帧生成器包括多层次的特征提取网络和生成网络,特征提取网络用于提取第一图像的不同深度的特征信息并融合特征信息,生成网络用于利用融合后的特征信息生成预测帧。融合后的特征信息可以是特征图。
作为示例,特征提取网络可以包括20个卷积层(采用1x1卷积和3x3卷积),4个最大池化层和1个激活层。采用多层卷积计算提取出第一图像(即正常行为图像)中不同深度的特征信息,并融合这些特征信息,融合后的特征信息输入生成网络,生成网络可以对融合后的特征信息进行三次卷积计算以及使用Tanh激活函数,得到一张256×256×3的图像,该图像即为预测帧图像。如图3所示,p 1、p 2、p 3、p 4即为第一图像,p t+1为第二图像。301为初始网络,第一图像输入初始网络经过多次卷积得到融合后的特征信息,再经过三次Conv(3,3)的卷积运算,以及经过Tanh激活函数,输出预测帧
Figure PCTCN2020139499-appb-000004
通常,在训练时,可以将预测帧与对应的第二图像进行比对,利用预设的损失函数确定表征预测帧与第二图像之间的差距的损失值,通过迭代优化特征提取网络和生成网络的参数,使预测帧接近第二图像,当满足训练结束条件时(例如损失值收敛、训练时长达到预设时长、训练次数达到预设次数等),结束训练。如图3所示,302表示三种损失函数,通过对比p t+1
Figure PCTCN2020139499-appb-000005
优化初始模型的参数以最小化三种损失函数的损失值。
在本实施例中,第二训练模块603可以基于预测帧和第二图像,训练初始模型包括的帧判别器。
其中,帧判别器用于判别输入的两个图像是否相同。帧判别器通常基于卷积神经网络训练得到。在训练时,将预测帧和实际帧(即第二图像)作为输入,将用于区别预测帧和实际帧的标注信息作为期望输出,利用机器学习方法,训练帧判别器。训练的目标是时帧判别器的判别准确性最高。
通常,预测帧生成器和帧判别器是交替训练的。例如首先固定帧判别器的参数,优化预测帧生成器的参数,直到帧判别器无法正确判别预测帧和实际帧。然后固定预测帧生成器的参数,优化帧判别器的参数,直到帧判别器可以准确判别预测帧和实际帧。
如图3所示,D为帧判别器,将p t+1
Figure PCTCN2020139499-appb-000006
输入D,即可得到表示当前帧为正常(normal)或不正常(abnormal)的信息。
在本实施例中,第一确定模块604可以响应于训练结束,将训练后的初始模型确定为异常检测模型。其中,训练结束条件可以包括但不限 于以下至少一种:损失函数的损失值收敛、训练次数达到预设次数、训练时长达到预设时长等。最终得到的异常检测模型包括训练后的预测帧生成器和帧判别器。
在本实施例的一些可选的实现方式中,第一训练模块可以包括:第一优化单元(图中未示出),用于基于预设的第一损失函数,优化特征提取网络的参数,其中,第一损失函数包括以下至少一种:L2距离损失、梯度约束损失、光流损失;第二优化单元(图中未示出),用于基于预设的第二损失函数,优化生成网络的参数,其中,第二损失函数包括最小二乘损失。
在本实施例的一些可选的实现方式中,第二训练模块603可以包括:叠加单元(图中未示出),用于将位于第二图像之前的预设数量个图像帧与预测帧叠加为多通道图像;提取单元(图中未示出),用于提取多通道图像的特征信息;估计单元(图中未示出),用于对多通道图像的特征信息进行光流估计以确定预测帧与第二图像之间的光流损失;第三优化单元(图中未示出),用于基于光流损失,对帧判别器的参数进行优化。
在本实施例的一些可选的实现方式中,第一图像的数量为至少两个。
在本实施例的一些可选的实现方式中,装置600还可以包括:第二获取模块(图中未示出),用于获取经过多次训练得到的多个异常检测模型;第二确定模块(图中未示出),用于确定多个异常检测模型的检测性能,并将检测性能最优的异常检测模型确定为进行异常事件检测所用的模型。
本申请的上述实施例提供的装置,通过基于获取的样本图像帧序列包括的第一图像和第二图像,训练初始模型包括的预测帧生成器,预测帧生成器生成预测帧,基于预测帧和第二图像,训练初始模型包括的帧判别器,最后将训练结束的初始模型确定为异常检测模型,由于帧生成器采用了融合多种不同深度的特征信息的方法,可以使生成的预测帧更接近实际,从而提高了异常检测的准确性。
进一步参考图7,作为对上述各图所示方法的实现,本申请提供了一种异常检测模型生成装置的一个实施例,该装置实施例与图5所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图7所示,本实施例的异常检测模型生成装置700包括:第三获取模块701,用于获取由图像采集设备采集的图像帧序列,其中,图像帧序列包括第一图像和第二图像,第二图像为第一图像的下一帧图像;预测模块702,用于将第一图像输入预先训练的异常检测模型包括的预测帧生成器,得到预测帧,其中,异常检测模型预先基于上述第一方面描述的方法训练得到;判别模块703,用于将预测帧和第二图像输入预先训练的帧判别器,得到表征预测帧和第二图像之间的相似程度的数值;输出模块704,用于响应于确定数值小于或等于预设的阈值,输出表征第二图像对应的时间点发生异常事件的信息
在本实施例中,第三获取模块701可以从本地或从远程获取由图像采集设备采集的图像帧序列。其中,图像采集设备可以为上述装置700包括的摄像头等设备,也可以是与上述装置700通信连接的其他设备包括的摄像头等设备。图像帧序列可以是实时采集的视频中包括的图像帧序列,也可以是预先存储的视频文件包括的图像帧序列。
其中,图像帧序列包括第一图像和第二图像,第二图像为第一图像的下一帧图像。其中,第一图像和第二图像的定义与上述步骤201基本一致,这里不再赘述。
在本实施例中,预测模块702可以将第一图像输入预先训练的异常检测模型包括的预测帧生成器,得到预测帧。其中,异常检测模型预先基于上述图2对应实施例描述的方法训练得到,关于预测帧生成器的描述,可以参见上述图2对应实施例中的描述。
在本实施例中,判别模块703可以将预测帧和第二图像输入预先训练的帧判别器,得到表征预测帧和第二图像之间的相似程度的数值。其中,关于帧判别器的描述,可以参见上述图2对应实施例中的描述。上述表征相似程度的数值越大,表示两个图像的相似程度越高。上述表征相似程度的数值可以通过各种方法计算得到,例如确定图像之间的余弦距离、欧氏距离等计算相似度。
在本实施例中,输出模块704可以响应于确定数值小于或等于预设的阈值,输出表征第二图像对应的时间点发生异常事件的信息。
具体地,当上述数值小于或等于预设的阈值时,表征预测的图像帧与实际的图像帧的差距较大,此时摄像头的拍摄范围内可能发生了异常情况,进一步输出各种形式的信息以提示用户当前发生了异常情况。上述表征发生异常事件的信息可以包括但不限于以下至少一种形式的信息:文字、图像、警报音等。
本申请的上述实施例提供的装置,通过使用上述图2对应实施例中训练的异常检测模型,可以在预测帧与实际帧相差较大时输出表征发生了异常现象的信息,从而可以高效、准确地对异常行为进行监控。
下面参考图8,其示出了适于用来实现本申请实施例的电子设备的计算机系统800的结构示意图。图8示出的电子设备仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图8所示,计算机系统800包括中央处理单元(CPU)801,其可以根据存储在只读存储器(ROM)802中的程序或者从存储部分808加载到随机访问存储器(RAM)803中的程序而执行各种适当的动作和处理。在RAM 803中,还存储有系统800操作所需的各种程序和数据。CPU801、ROM 802以及RAM 803通过总线804彼此相连。输入/输出(I/O)接口805也连接至总线804。
以下部件连接至I/O接口805:包括键盘、鼠标等的输入部分806;包括诸如液晶显示器(LCD)等以及扬声器等的输出部分807;包括硬盘等的存储部分808;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分809。通信部分809经由诸如因特网的网络执行通信处理。驱动器810也根据需要连接至I/O接口805。可拆卸介质811,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器810上,以便于从其上读出的计算机程序根据需要被安装入存储部分808。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分809从网络上被下载和安装,和/或从可拆卸介质 811被安装。在该计算机程序被中央处理单元(CPU)801执行时,执行本申请的方法中限定的上述功能。
需要说明的是,本申请所述的计算机可读存储介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读存储介质,该计算机可读存储介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读存储介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言或其组合来编写用于执行本申请的操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的模块也可以设置在处理器中,例如,可以描述为:一种处理器包括第一获取模块、第一训练模块、第 二训练模块和第一确定模块。其中,这些模块的名称在某种情况下并不构成对该单元本身的限定,例如,第一获取模块还可以被描述为“用于获取多个样本图像帧序列的模块”。
作为另一方面,本申请还提供了一种计算机可读存储介质,该计算机可读存储介质可以是上述实施例中描述的电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。上述计算机可读存储介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取多个样本图像帧序列,其中,每个样本图像帧序列包括第一图像和第二图像,第二图像为第一图像的下一帧图像;基于第一图像和第二图像,训练初始模型包括的预测帧生成器,其中,预测帧生成器包括多层次的特征提取网络和生成网络,特征提取网络用于提取第一图像的不同深度的特征信息并融合特征信息,生成网络用于利用融合后的特征信息生成预测帧;基于预测帧和第二图像,训练初始模型包括的帧判别器;响应于训练结束,将训练后的初始模型确定为异常检测模型。
此外,当上述一个或者多个程序被该电子设备执行时,还可以使得该电子设备:获取由图像采集设备采集的图像帧序列,其中,图像帧序列包括第一图像和第二图像,第二图像为第一图像的下一帧图像;将第一图像输入预先训练的异常检测模型包括的预测帧生成器,得到预测帧,其中,异常检测模型预先基于上述第一方面描述的方法训练得到;将预测帧和第二图像输入预先训练的帧判别器,得到表征预测帧和第二图像之间的相似程度的数值;响应于确定数值小于或等于预设的阈值,输出表征第二图像对应的时间点发生异常事件的信息。
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (10)

  1. 一种异常检测模型生成方法,其特征在于,所述方法包括:
    获取多个样本图像帧序列,其中,每个样本图像帧序列包括第一图像和第二图像,所述第二图像为所述第一图像的下一帧图像;
    基于所述第一图像和所述第二图像,训练初始模型包括的预测帧生成器,其中,所述预测帧生成器包括多层次的特征提取网络和生成网络,所述特征提取网络用于提取所述第一图像的不同深度的特征信息并融合所述特征信息,所述生成网络用于利用融合后的特征信息生成预测帧;
    基于所述预测帧和所述第二图像,训练所述初始模型包括的帧判别器;
    响应于训练结束,将训练后的初始模型确定为异常检测模型。
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述第一图像和所述第二图像,训练初始模型包括的预测帧生成器,包括:
    基于预设的第一损失函数,优化所述特征提取网络的参数,其中,所述第一损失函数包括以下至少一种:L2距离损失、梯度约束损失、光流损失;
    基于预设的第二损失函数,优化所述生成网络的参数,其中,所述第二损失函数包括最小二乘损失。
  3. 根据权利要求1所述的方法,其特征在于,所述基于所述预测帧和所述第二图像,训练所述初始模型包括的帧判别器,包括:
    将位于所述第二图像之前的预设数量个图像帧与所述预测帧叠加为多通道图像;
    提取所述多通道图像的特征信息;
    对所述多通道图像的特征信息进行光流估计以确定所述预测帧与所述第二图像之间的光流损失;
    基于所述光流损失,对所述帧判别器的参数进行优化。
  4. 根据权利要求1-3之一所述的方法,其特征在于,所述第一图像的数量为至少两个。
  5. 根据权利要求1-3之一所述的方法,其特征在于,所述方法还包括:
    获取经过多次训练得到的多个异常检测模型;
    确定所述多个异常检测模型的检测性能,并将检测性能最优的异常检测模型确定为进行异常事件检测所用的模型。
  6. 一种异常事件检测方法,其特征在于,所述方法包括:
    获取由图像采集设备采集的图像帧序列,其中,所述图像帧序列包括第一图像和第二图像,所述第二图像为所述第一图像的下一帧图像;
    将所述第一图像输入预先训练的异常检测模型包括的预测帧生成器,得到预测帧,其中,所述异常检测模型预先基于权利要求1-5之一所述的方法训练得到;
    将所述预测帧和所述第二图像输入预先训练的帧判别器,得到表征所述预测帧和所述第二图像之间的相似程度的数值;
    响应于确定所述数值小于或等于预设的阈值,输出表征所述第二图像对应的时间点发生异常事件的信息。
  7. 一种异常检测模型生成装置,其特征在于,所述装置包括:
    第一获取模块,用于获取多个样本图像帧序列,其中,每个样本图像帧序列包括第一图像和第二图像,所述第二图像为所述第一图像的下一帧图像;
    第一训练模块,用于基于所述第一图像和所述第二图像,训练初始模型包括的预测帧生成器,其中,所述预测帧生成器包括多层次的特征 提取网络和生成网络,所述特征提取网络用于提取所述第一图像的不同深度的特征信息并融合所述特征信息,所述生成网络用于利用融合后的特征信息生成预测帧;
    第二训练模块,用于基于所述预测帧和所述第二图像,训练所述初始模型包括的帧判别器;
    第一确定模块,用于响应于训练结束,将训练后的初始模型确定为异常检测模型。
  8. 一种异常事件检测装置,其特征在于,所述装置包括:
    第三获取模块,用于获取由图像采集设备采集的图像帧序列,其中,所述图像帧序列包括第一图像和第二图像,所述第二图像为所述第一图像的下一帧图像;
    预测模块,用于将所述第一图像输入预先训练的异常检测模型包括的预测帧生成器,得到预测帧,其中,所述异常检测模型预先基于权利要求1-5之一所述的方法训练得到;
    判别模块,用于将所述预测帧和所述第二图像输入预先训练的帧判别器,得到表征所述预测帧和所述第二图像之间的相似程度的数值;
    输出模块,用于响应于确定所述数值小于或等于预设的阈值,输出表征所述第二图像对应的时间点发生异常事件的信息。
  9. 一种电子设备,包括:
    一个或多个处理器;
    存储装置,用于存储一个或多个程序,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-6中任一所述的方法。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-6中任一所述的方法。
PCT/CN2020/139499 2020-12-02 2020-12-25 异常检测模型生成方法和装置、异常事件检测方法和装置 WO2022116322A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011405894.4 2020-12-02
CN202011405894.4A CN112465049A (zh) 2020-12-02 2020-12-02 异常检测模型生成方法和装置、异常事件检测方法和装置

Publications (1)

Publication Number Publication Date
WO2022116322A1 true WO2022116322A1 (zh) 2022-06-09

Family

ID=74806531

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/139499 WO2022116322A1 (zh) 2020-12-02 2020-12-25 异常检测模型生成方法和装置、异常事件检测方法和装置

Country Status (2)

Country Link
CN (1) CN112465049A (zh)
WO (1) WO2022116322A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115238805A (zh) * 2022-07-29 2022-10-25 中国电信股份有限公司 异常数据识别模型的训练方法及相关设备
CN115296984A (zh) * 2022-08-08 2022-11-04 中国电信股份有限公司 异常网络节点的检测方法及装置、设备、存储介质
CN115546293A (zh) * 2022-12-02 2022-12-30 广汽埃安新能源汽车股份有限公司 障碍物信息融合方法、装置、电子设备和计算机可读介质
CN115984757A (zh) * 2023-03-20 2023-04-18 松立控股集团股份有限公司 一种基于全局局部双流特征互学习的异常事件检测方法
CN117115740A (zh) * 2023-09-05 2023-11-24 北京智芯微电子科技有限公司 基于深度学习的电梯开关门状态检测方法、装置及设备

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468945A (zh) * 2021-03-26 2021-10-01 厦门大学 游泳者溺水检测方法
CN113364792B (zh) * 2021-06-11 2022-07-12 奇安信科技集团股份有限公司 流量检测模型的训练方法、流量检测方法、装置及设备
CN113435432B (zh) * 2021-08-27 2021-11-30 腾讯科技(深圳)有限公司 视频异常检测模型训练方法、视频异常检测方法和装置
CN113743607B (zh) * 2021-09-15 2023-12-05 京东科技信息技术有限公司 异常检测模型的训练方法、异常检测方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180189610A1 (en) * 2015-08-24 2018-07-05 Carl Zeiss Industrielle Messtechnik Gmbh Active machine learning for training an event classification
CN109522828A (zh) * 2018-11-01 2019-03-26 上海科技大学 一种异常事件检测方法及系统、存储介质及终端
CN110705376A (zh) * 2019-09-11 2020-01-17 南京邮电大学 一种基于生成式对抗网络的异常行为检测方法
CN112016500A (zh) * 2020-09-04 2020-12-01 山东大学 基于多尺度时间信息融合的群体异常行为识别方法及系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259814B (zh) * 2020-01-17 2023-10-31 杭州涂鸦信息技术有限公司 一种活体检测方法及系统
CN111881750A (zh) * 2020-06-24 2020-11-03 北京工业大学 基于生成对抗网络的人群异常检测方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180189610A1 (en) * 2015-08-24 2018-07-05 Carl Zeiss Industrielle Messtechnik Gmbh Active machine learning for training an event classification
CN109522828A (zh) * 2018-11-01 2019-03-26 上海科技大学 一种异常事件检测方法及系统、存储介质及终端
CN110705376A (zh) * 2019-09-11 2020-01-17 南京邮电大学 一种基于生成式对抗网络的异常行为检测方法
CN112016500A (zh) * 2020-09-04 2020-12-01 山东大学 基于多尺度时间信息融合的群体异常行为识别方法及系统

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115238805A (zh) * 2022-07-29 2022-10-25 中国电信股份有限公司 异常数据识别模型的训练方法及相关设备
CN115238805B (zh) * 2022-07-29 2023-12-15 中国电信股份有限公司 异常数据识别模型的训练方法及相关设备
CN115296984A (zh) * 2022-08-08 2022-11-04 中国电信股份有限公司 异常网络节点的检测方法及装置、设备、存储介质
CN115296984B (zh) * 2022-08-08 2023-12-19 中国电信股份有限公司 异常网络节点的检测方法及装置、设备、存储介质
CN115546293A (zh) * 2022-12-02 2022-12-30 广汽埃安新能源汽车股份有限公司 障碍物信息融合方法、装置、电子设备和计算机可读介质
CN115546293B (zh) * 2022-12-02 2023-03-07 广汽埃安新能源汽车股份有限公司 障碍物信息融合方法、装置、电子设备和计算机可读介质
CN115984757A (zh) * 2023-03-20 2023-04-18 松立控股集团股份有限公司 一种基于全局局部双流特征互学习的异常事件检测方法
CN115984757B (zh) * 2023-03-20 2023-05-16 松立控股集团股份有限公司 一种基于全局局部双流特征互学习的异常事件检测方法
CN117115740A (zh) * 2023-09-05 2023-11-24 北京智芯微电子科技有限公司 基于深度学习的电梯开关门状态检测方法、装置及设备

Also Published As

Publication number Publication date
CN112465049A (zh) 2021-03-09

Similar Documents

Publication Publication Date Title
WO2022116322A1 (zh) 异常检测模型生成方法和装置、异常事件检测方法和装置
US11995528B2 (en) Learning observation representations by predicting the future in latent space
CN111314733B (zh) 用于评估视频清晰度的方法和装置
US11392792B2 (en) Method and apparatus for generating vehicle damage information
WO2020087974A1 (zh) 生成模型的方法和装置
EP3893125A1 (en) Method and apparatus for searching video segment, device, medium and computer program product
CN109376267B (zh) 用于生成模型的方法和装置
CN109447156B (zh) 用于生成模型的方法和装置
CN111523640B (zh) 神经网络模型的训练方法和装置
WO2022252881A1 (zh) 图像处理方法、装置、可读介质和电子设备
CN108228428B (zh) 用于输出信息的方法和装置
CN113140012B (zh) 图像处理方法、装置、介质及电子设备
CN112200173B (zh) 多网络模型训练方法、图像标注方法和人脸图像识别方法
CN117690063B (zh) 电缆线路检测方法、装置、电子设备与计算机可读介质
CN111598006A (zh) 用于标注对象的方法和装置
CN118053123B (zh) 报警信息生成方法、装置、电子设备与计算机介质
CN115294501A (zh) 视频识别方法、视频识别模型训练方法、介质及电子设备
CN118229967A (zh) 模型构建方法、图像分割方法、装置、设备、介质
CN113033707B (zh) 视频分类方法、装置、可读介质及电子设备
WO2022148239A1 (zh) 信息输出方法、装置和电子设备
US11954591B2 (en) Picture set description generation method and apparatus, and computer device and storage medium
CN115375656A (zh) 息肉分割模型的训练方法、分割方法、装置、介质及设备
CN114510932A (zh) 自然语言处理方法、电子设备、存储介质
CN114238968A (zh) 应用程序检测方法及装置、存储介质及电子设备
CN114004229A (zh) 文本识别方法、装置、可读介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20964167

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20964167

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.11.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20964167

Country of ref document: EP

Kind code of ref document: A1