CN111783613B

CN111783613B - Anomaly detection method, model training method, device, equipment and storage medium

Info

Publication number: CN111783613B
Application number: CN202010597568.1A
Authority: CN
Inventors: 张伟; 谭啸; 李莹莹; 孙昊; 文石磊; 章宏武; 丁二锐
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-06-28
Filing date: 2020-06-28
Publication date: 2021-10-08
Anticipated expiration: 2040-06-28
Also published as: CN111783613A

Abstract

The application discloses an anomaly detection method, a model training method, an anomaly detection device, an anomaly detection equipment and a storage medium, and relates to the technical field of computers, in particular to the fields of computer vision and image processing. The abnormality detection method disclosed by the application comprises the following steps: the method comprises the steps of preprocessing an acquired video to be detected to obtain first detection examples of a plurality of pipeline types and second detection examples of a plurality of video levels, inputting the detection examples of the two types into an abnormality detection model, and obtaining a detection example of the target object in the video to be detected with abnormality. The first detection example and the second detection example respectively comprise a plurality of continuous image frames, the image frames in the detection examples of the two types correspond to each other one by one, and the size of the image frame in the first detection example is smaller than that of the image frame in the second detection example. The anomaly detection method relates to anomaly detection of detection examples of two types, and the obtained video anomaly detection result is more accurate.

Description

Anomaly detection method, model training method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the field of computer vision in computer technology, in particular to an anomaly detection method, a model training method, a device, equipment and a storage medium.

Background

The anomaly detection of the surveillance video is an important research field in computer vision, and is used for carrying out intelligent analysis on the surveillance video and automatically identifying abnormal events in the video or abnormal behaviors of a monitored object, such as fire accidents, personnel gathering, vehicle retrograde motion and the like. The technology is widely applied to the fields of intelligent transportation, security and protection and the like.

In the prior art, most of monitoring video abnormity detection is based on supervised learning. Supervised learning builds a predictive model by learning a large number of training samples, each of which has a label indicating its true value output. The anomaly detection method based on supervised learning can only detect the pre-trained abnormal events or behaviors in the monitoring video, has poor scene adaptability and has great limitation.

Disclosure of Invention

The application provides an anomaly detection method, a model training method, a device, equipment and a storage medium, and improves the accuracy of anomaly detection.

In a first aspect, the present application provides an anomaly detection method, including: acquiring a video to be detected, wherein the video to be detected comprises a target object; preprocessing a video to be detected to obtain a plurality of pipeline-level first detection examples and a plurality of video-level second detection examples, wherein the first detection examples and the second detection examples respectively comprise a plurality of continuous image frames, the image frames in the first detection examples and the second detection examples are in one-to-one correspondence, and the size of the image frame in the first detection example is smaller than that of the image frame in the second detection example; and inputting the plurality of first detection examples and the plurality of second detection examples into an abnormality detection model to obtain detection examples of the target object in the video to be detected with abnormality. The anomaly detection model comprises a first submodel and a second submodel, and the first submodel and the second submodel are obtained by training the initial model based on different training examples.

The anomaly detection method relates to anomaly detection of detection examples of two types, can realize video anomaly positioning of time dimension and space dimension, and obtains more accurate video anomaly detection results.

In a second aspect, the present application provides a method for training an anomaly detection model, where the anomaly detection model includes a first sub-model and a second sub-model, and the method includes: acquiring a video sample comprising a video tag, wherein the video tag is used for indicating whether a target object in the video sample is abnormal or not; preprocessing a video sample to obtain a plurality of pipeline-level first training examples and a plurality of video-level second training examples, wherein the first training examples and the second training examples both comprise a plurality of continuous image frames, all the image frames in the first training examples comprise target objects, and part of the image frames in the second training examples comprise the target objects; determining a first loss function according to a first score set obtained by inputting a plurality of first training examples to an original first submodel; determining a second loss function according to a second value set obtained by inputting a plurality of second training examples to an original second submodel; determining a total loss function of the anomaly detection model according to the first loss function and the second loss function; and when the total loss function is converged, combining the currently trained first sub-model and the currently trained second sub-model to obtain an abnormality detection model.

In the model training process, two types of training examples are adopted to respectively train the two sub-models, and loss functions of the two sub-models are integrated to obtain the anomaly detection model with high anomaly detection accuracy.

In a third aspect, the present application provides an abnormality detection apparatus comprising: the acquisition module is used for acquiring a video to be detected, wherein the video to be detected comprises a target object; the device comprises a processing module, a detection module and a display module, wherein the processing module is used for preprocessing a video to be detected to obtain a plurality of pipeline-level first detection examples and a plurality of video-level second detection examples, the first detection examples and the second detection examples respectively comprise a plurality of continuous image frames, the image frames in the first detection examples and the second detection examples are in one-to-one correspondence, and the size of the image frame in the first detection example is smaller than that of the image frame in the second detection example; and the processing module is further used for inputting the plurality of first detection examples and the plurality of second detection examples into the anomaly detection model to obtain detection examples of the target object in the video to be detected, wherein the target object is anomalous. The anomaly detection model comprises a first submodel and a second submodel, and the first submodel and the second submodel are obtained by training the initial model based on different training examples.

In a fourth aspect, the present application provides a training apparatus for an anomaly detection model, where the anomaly detection model includes a first submodel and a second submodel, the apparatus including: the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a video sample comprising a video tag, and the video tag is used for indicating whether a target object in the video sample is abnormal or not; the processing module is used for preprocessing a video sample to obtain a plurality of pipeline-level first training examples and a plurality of video-level second training examples, wherein the first training examples and the second training examples both comprise a plurality of continuous image frames, all the image frames in the first training examples comprise target objects, and part of the image frames in the second training examples comprise the target objects; determining a first loss function according to a first score set obtained by inputting a plurality of first training examples to an original first submodel; determining a second loss function according to a second value set obtained by inputting a plurality of second training examples to an original second submodel; determining a total loss function of the anomaly detection model according to the first loss function and the second loss function; and when the total loss function is converged, combining the currently trained first sub-model and the currently trained second sub-model to obtain an abnormality detection model.

In a fifth aspect, the present application provides an electronic device, comprising: the system includes at least one processor, and a memory communicatively coupled to the at least one processor. Wherein the memory stores instructions executable by the at least one processor to enable the electronic device to perform the method of any one of the first aspect.

In a sixth aspect, the present application provides an electronic device, comprising: the system includes at least one processor, and a memory communicatively coupled to the at least one processor. Wherein the memory stores instructions executable by the at least one processor to enable the electronic device to perform the method of any of the second aspects.

In a seventh aspect, the present application provides a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any of the first aspects.

In an eighth aspect, the present application provides a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any of the second aspects.

In a ninth aspect, the present application provides an abnormality detection method, including: the method comprises the steps of obtaining a first detection example of a plurality of pipeline levels corresponding to a video to be detected and a second detection example of a plurality of video levels corresponding to the video to be detected, wherein the first detection example and the second detection example both comprise a plurality of continuous image frames, the image frames in the first detection example and the second detection example correspond to each other one by one, and the size of the image frame in the first detection example is smaller than that of the image frame in the second detection example; and inputting the plurality of first detection examples and the plurality of second detection examples into an abnormality detection model to obtain detection examples of the target object in the video to be detected with abnormality.

In a tenth aspect, the present application provides a method for training an anomaly detection model, where the anomaly detection model includes a first submodel and a second submodel, and the method includes: acquiring a plurality of pipeline-level first training examples corresponding to a video sample and a plurality of video-level second training examples corresponding to the video sample, wherein the first training examples and the second training examples both comprise a plurality of continuous image frames, all the image frames in the first training examples comprise target objects, and part of the image frames in the second training examples comprise the target objects; training a first submodel through a plurality of first training examples, and training a second submodel through a plurality of second training examples; determining a total loss function of the anomaly detection model according to a first loss function of the first submodel and a second loss function of the second submodel; and when the total loss function is converged, combining the currently trained first sub-model and the currently trained second sub-model to obtain a trained anomaly detection model.

In an eleventh aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the method of any one of the first aspects.

In a twelfth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the method of any of the second aspects.

According to the technical scheme of the anomaly detection method, the examples to be detected are preprocessed, so that the detection examples of two types with corresponding relations are obtained, the video anomaly positioning of space-time dimensionality is realized on the basis of the detection examples of the two types, and the accuracy of video anomaly detection is improved.

According to the technical scheme of the training method of the anomaly detection model, training examples of two types are constructed, the anomaly detection model is trained on the basis of the training examples of the two types and the constructed total loss function, and the anomaly detection model with high anomaly detection accuracy is obtained.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a schematic structural diagram of an anomaly detection model provided in an embodiment of the present application;

fig. 2 is a flowchart of an anomaly detection method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of two types of detection examples provided by an embodiment of the present application;

fig. 4 is a flowchart of an anomaly detection method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an image frame including a first example of detection of two target objects according to an embodiment of the present application;

fig. 6 is a flowchart of a training method of an anomaly detection model according to an embodiment of the present disclosure;

fig. 7 is a flowchart of a training method of an anomaly detection model according to an embodiment of the present application;

fig. 8a is a schematic diagram of constructing a first training packet according to an embodiment of the present application;

fig. 8b is a schematic diagram of constructing a second training packet according to an embodiment of the present application;

FIG. 9 is a flow chart for constructing a first loss function according to an embodiment of the present application;

FIG. 10 is a flow chart for constructing a second loss function according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an abnormality detection apparatus according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of a training apparatus for an anomaly detection model according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of an electronic device of an anomaly detection method according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of an electronic device of a training method for an anomaly detection model according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The video anomaly detection is mostly based on supervised learning at present, the supervised learning needs to label each section of training video with time and space dimensions, the labeling workload is large, only the pre-trained abnormal events or behaviors in the monitoring video can be detected, the scene adaptability is poor, and the method has great limitation.

In contrast, the anomaly detection method provided by the application adopts multi-instance learning, divides the video to be detected into two detection instances, inputs the two detection instances into a trained anomaly detection model, integrates the detection results of the two detection instances, and finally outputs the detection instance with anomaly in the video to be detected, so that the video to be detected is accurately positioned in the space-time dimension, and the video anomaly detection accuracy is high.

The application provides the above anomaly detection method, and also provides a training method, device, equipment and storage medium of an anomaly detection model, which are applied to the fields of computer vision and image processing in the technical field of computers to achieve the purpose of improving the video anomaly detection effect.

Fig. 1 is a schematic structural diagram of an abnormality detection model provided in the embodiment of the present application, and referring to fig. 1, the abnormality detection model provided in the embodiment includes a first sub-model, a second sub-model, and a post-processing unit. The first sub-model and the second sub-model respectively comprise a feature extraction unit, a feature enhancement unit and a prediction unit. The input of the first sub-model is mainly a first detection example at a pipeline level, the input of the second sub-model is mainly a second detection example at a video level, the first detection example and the second detection example both comprise a plurality of continuous image frames, images in the first detection example correspond to images in the second detection example one by one, the images in the first detection example are local images of a video to be detected, and the images in the second detection example are full image images of the video to be detected.

The first sub-model and the second sub-model in the embodiment of the application have the same structure, and the two sub-models are obtained by training the same initial model based on different training examples. The training instance of the first submodel is a pipeline-level training instance (i.e., a partial image of a video sample) and the training instance of the second submodel is a video-level training instance (i.e., a full image of a video sample). Therefore, the sub-models are trained based on different training examples, and the abnormal detection effects of the two sub-models can be different.

As an example, the feature extraction units of the two submodels may employ the C3D module for feature extraction.

As an example, the feature enhancement units of the two submodels may perform feature enhancement by using a self-attention-based feature enhancement module, so that the feature of each detection instance can contain feature information of the target object cycle. Further, the feature enhancement unit may further copy the enhanced features output by the single-channel attention-directed feature enhancement module (for example, copy 8 copies), then perform feature splicing on the copied enhanced features, and combine residual connection to obtain the final enhanced features.

As an example, the prediction units of the two submodels may train a fully-connected neural network using the enhanced features to obtain a score for each detection instance.

And the post-processing unit in the anomaly detection model is used for determining a detection example of the anomaly of the target object in the video to be detected according to the value sets of the prediction units in the two submodels.

The technical solution of the present application will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 2 is a flowchart of an abnormality detection method according to an embodiment of the present application, and as shown in fig. 2, the abnormality detection method according to the embodiment includes the following steps:

step 101, a video to be detected is obtained, wherein the video to be detected comprises a target object.

According to the embodiment of the application, the video to be detected can be acquired through the image acquisition device. The image acquisition device can be a camera device arranged in different public areas, such as a camera arranged in an underground garage, a camera arranged beside a road and the like, and can also be an image capturing device arranged in different public areas.

The video to be detected collected by the image collecting device comprises a plurality of continuous image frames. In one possible case, the partial image frames in the video to be detected include a target object, such as a video picture continuously captured by the camera device, and the target object is included in the video picture, that is, the target object appears in the partial image frames of the video picture. In another possible case, each image frame in the video to be detected includes a target object, for example, an image capturing device captures a video image when the target object is detected, so as to obtain a plurality of consecutive image frames.

The target object in the video to be detected in the embodiment of the present application includes any kind of movable object, such as a person, a vehicle, an animal, and the like. For example, the target object is a human, and the abnormality of the target object includes a large crowd accumulation, a crowd violence, a pedestrian behavior abnormality, and the like. For another example, the target object is a vehicle, and the target object presence abnormality includes a vehicle travel track abnormality, a vehicle traffic accident, a vehicle congestion, and the like.

The target object in the video to be detected in the embodiment of the application may further include any other types of monitoring objects, such as smoke, fire, water flow, and the like, and the target object having an abnormality includes a fire accident, a natural disaster accident, and the like.

And 102, preprocessing a video to be detected to obtain a plurality of first detection examples at a pipeline level and a plurality of second detection examples at a video level.

The first detection example and the second detection example respectively comprise a plurality of continuous image frames, the image frames in the first detection example and the image frames in the second detection example correspond to each other in a one-to-one mode, and the size of the image frame in the first detection example is smaller than that of the image frame in the second detection example.

In the embodiment of the application, the preprocessing process of the video to be detected is a process of constructing two types of detection examples. The first detection example belongs to a pipeline-level detection example, and the pipeline-level detection example refers to a plurality of continuous image frames taking a detection frame corresponding to a target object in a video to be detected as a detection example. The second detection example belongs to a video-level detection example, and the video-level detection example refers to a plurality of continuous image frames taking a full-image of a video to be detected as a detection example. Each image frame of the two classes of detection instances includes a target object.

Fig. 3 is a schematic diagram of two types of detection examples provided by an embodiment of the present application, and referring to fig. 3, a video to be detected includes 10 consecutive image frames, and each of the 10 consecutive image frames includes the same target object (shown as a shaded portion in fig. 3). The first detection example of the constructed pipeline level is a series of detection frame sets corresponding to the target object, namely image blocks corresponding to 10 detection frames, and due to the mobility of the target object, the sizes and positions of the 10 image blocks are different. The constructed second detection example of the video level is a full-image set corresponding to each image block in the first detection example, and the second detection example shown in fig. 3 is an original video to be detected. It can be seen that the first detection example corresponds to the image frames in the second detection example one by one, and the size of the image frames in the first detection example is smaller than that of the image frames in the second detection example.

Step 103, inputting the plurality of first detection instances and the plurality of second detection instances into an anomaly detection model to obtain a detection instance in which the target object in the video to be detected is anomalous.

The anomaly detection model comprises a first submodel and a second submodel, and the first submodel and the second submodel are obtained by training the initial model based on different training examples.

The anomaly detection model comprises a first sub-model and a second sub-model, wherein the first sub-model and the second sub-model are used for determining whether a target object in a video to be detected is abnormal, only the two sub-models adopt different types of training examples in the training process, and therefore detection results output by the two sub-models may have differences. Inputting the plurality of first detection examples obtained in the step 102 into an anomaly detection model to obtain first detection results corresponding to the plurality of first detection examples; inputting the plurality of second detection examples obtained in the step 102 into the anomaly detection model to obtain second detection results corresponding to the plurality of second detection examples; the anomaly detection model finally determines a final detection result of the video to be detected by performing data analysis on the first detection result and the second detection result, wherein the final detection result comprises a detection example that the target object in the video to be detected is abnormal (namely a plurality of continuous image frames with the abnormality in the video to be detected and a detection frame of the abnormal target object in the image frames).

It should be noted that, in the embodiment of the present application, no limitation is imposed on the input order of the first detection example and the second detection example to input the abnormality detection model, and the abnormality detection model may be input according to a preset order, or may be input simultaneously.

Therefore, the anomaly detection model provided by the embodiment of the application can acquire the information of the time dimension (the position of the anomaly image frame in the video to be detected) and the space dimension (the space position of the anomaly target object in the anomaly image frame) of the video to be detected with anomaly, and the accuracy of video anomaly detection is improved.

According to the anomaly detection method provided by the embodiment of the application, the video to be detected comprising the target object is obtained, the video to be detected is preprocessed, the detection examples of two types are obtained, the detection examples of the two types are input into the trained anomaly detection model, and the detection example of the target object in the video to be detected with anomaly is obtained. The detection examples of the two types comprise a plurality of continuous image frames, the image frames in the detection examples of the two types are in one-to-one correspondence, and the sizes of the image frames of the detection examples of the two types which are in one-to-one correspondence are different. The detection process relates to anomaly detection of detection examples of two types, can realize video anomaly positioning of time dimension and space dimension, and improves accuracy of video anomaly detection.

Fig. 4 is a flowchart of an abnormality detection method according to an embodiment of the present application, and referring to fig. 4, the abnormality detection method according to the embodiment includes the following steps:

step 201, a video to be detected is obtained, wherein the video to be detected comprises a target object.

Step 201 of this embodiment is the same as step 101 of the above embodiment, and reference may be made to the above embodiment specifically, which is not described herein again.

Step 202, performing target detection on each image frame in the video to be detected, taking image blocks of a detection frame corresponding to a target object in the video to be detected as a first detection packet, and taking image blocks of a full image including the target object in the video to be detected as a second detection packet.

The first detection packet comprises a plurality of first detection examples, and the second detection packet comprises a plurality of second detection examples.

In a possible implementation manner of the embodiment of the application, a video to be detected is acquired through a camera device, and first, target detection is performed on each image frame in the video to be detected, one or more image frames not including a target object are removed, then, target detection is performed on a plurality of continuous image frames including the target object, and a first detection packet and a second detection packet are constructed.

Illustratively, the video to be detected comprises 100 image frames, wherein the image frames comprising the target object are the 1 st to 10 th frames, the 41 st to 50 th frames and the 85 th to 100 th frames. Firstly, image frames which do not comprise the target object are removed through target detection, and then two types of detection packets are constructed according to the image frames which comprise the target object. If the number of the continuous image frames exceeds the preset number (for example, 10 frames), the detection example can also be subjected to a segmentation process, for example, the above 85 th to 100 th frames are segmented into two detection examples, namely, the 85 th to 94 th frames and the 95 th to 100 th frames. The second detection packet comprises full-image images of 1 st to 10 th frames, 41 st to 50 th frames, 85 th to 94 th frames and 95 th to 100 th frames, and every 10 continuous full-image images in the second detection packet are a second detection example. Correspondingly, the first detection packet comprises partial images of detection frames corresponding to the target objects in the 1 st to 10 th frames, the 41 st to 50 th frames, the 85 th to 94 th frames and the 95 th to 100 th frames, and every 10 continuous partial images in the first detection packet are a second detection example.

In a possible implementation manner of the embodiment of the application, the video to be detected is acquired through image capture equipment, each image frame in the video to be detected comprises a target object, and two types of detection packets can be directly constructed according to the image frames of the video to be detected. The principle of constructing two types of detection packets in this embodiment is the same as that in the above embodiment, and is not described here again.

It should be noted that, the first detection example and the second detection example each include a plurality of consecutive image frames, and the plurality of consecutive image frames may include one or more target objects, which is not limited in any way by the embodiments of the present application.

If the plurality of consecutive image frames include a target object, the first detection instance includes an image of a detection frame corresponding to the target object in the plurality of consecutive image frames. If the plurality of consecutive image frames include a plurality of target objects, the first detection instance includes an image of a bounding box of the plurality of consecutive image frames that includes detection boxes corresponding to the plurality of target objects.

Exemplarily, fig. 5 is a schematic diagram of a certain image frame of a first detection example including two target objects according to an embodiment of the present application, and as can be known from fig. 5, the certain image frame of the first detection example includes partial images of the vehicles 1 and 2 (i.e., image blocks including the vehicles 1 and 2), and the detection frame 1 of the vehicle 1 and the detection frame 2 of the vehicle 2 are further labeled in the images.

Step 203, inputting the plurality of first detection examples to a first submodel of the anomaly detection model to obtain a first score set, wherein the first score set comprises a score corresponding to each first detection example.

And 204, inputting the plurality of second detection examples to a second submodel of the anomaly detection model to obtain a second score set, wherein the second score set comprises scores corresponding to each second detection example.

In the embodiment of the application, the abnormality detection model comprises a first sub-model and a second sub-model, wherein the first sub-model is used for performing abnormality scoring on the image frames in the first detection example, and the second sub-model is used for performing abnormality scoring on the image frames in the second detection example.

Typically, one detection instance corresponds to one score. Illustratively, taking the second detection example as an example, for example, the 1 st to 10 th frame full-image in the video to be detected is a second detection example, the second detection example is input into a second sub-model of the anomaly detection model to obtain 10 scores, and the second sub-model takes the highest score as the score of the second detection example. It is to be understood that, since the second detection instance of the input abnormality detection model includes a plurality of, correspondingly, a plurality of scores, i.e., the second score set, can be obtained. Similarly, a plurality of scores, that is, a first score set, may also be obtained by inputting the plurality of first detection instances into the first sub-model of the anomaly detection model.

As an example, the scoring value of the two submodels of the anomaly detection model is between 0 and 1, the closer the scoring value is to 1, the target object in the detection instance is abnormal, and the closer the scoring value is to 0, the target object in the detection instance is normal.

Optionally, a critical point may be set between 0 and 1, for example, 0.5, the detection example with the score value between 0 and 0.5 (excluding 0.5) is a normal example, and the detection example with the score value between 0.5 (including 0.5) and 1 is an abnormal example.

And step 205, determining a detection example of the target object in the video to be detected having abnormality according to the first and second score sets.

In a possible implementation manner of the embodiment of the application, a detection example with the highest average score may be determined according to the first score set and the second score set, and the detection example with the highest average score is used as a detection example of abnormality of a target object in a video to be detected, which may be specifically referred to in formula one.

In the formula, T_predA detection example showing that the target object in the video to be detected output by the anomaly detection model has anomaly,

showing the ith first instance of detection,

showing the ith second instance of detection,

representing the output score of the first submodel that inputs the ith first detection instance into the anomaly detection model,

an output score representing the input of the ith second detection instance into the second submodel of the anomaly detection model. Wherein the content of the first and second substances,

and

the image frames in (1) correspond one-to-one. argmax is a function, is a pair function

Taking the variable T corresponding to the maximum value_i(set of image frames).

In a possible implementation manner of the embodiment of the application, a detection instance with the highest comprehensive score may be determined according to the first score set, the second score set, and the preset weight value. In the embodiment, the weight of the score value of the first sub-model is set to be alpha according to actual requirements, and the weight of the score value of the second sub-model is set to be 1-alpha. Detect instance T with the ith_iFor example, the comprehensive score value of the ith detection instance may be determined according to the score value of the first submodel for the ith first detection instance, the score value of the second submodel for the ith second detection instance, and a preset weight value. And finally, taking the detection example with the highest comprehensive score as the detection example of the abnormality of the target object in the video to be detected.

The detection examples of the target object in the video to be detected, which are determined in the embodiment of the application, having the abnormality include a first detection example of a pipeline category and a second detection example of a video level. The first detection example at the pipeline level is used for indicating the spatial position information of the target object in the abnormal image frame in the video to be detected, and the second detection example at the video level is used for indicating the temporal position information of the abnormal image frame in the video to be detected. In the related art, the anomaly detection method adopts multi-example learning, anomaly detection can be only carried out in a time dimension, and spatial information cannot be positioned.

According to the anomaly detection method provided by the embodiment of the application, a video to be detected comprising a target object is obtained; performing target detection on each image frame in a video to be detected, taking an image block of a detection frame corresponding to a target object in the video to be detected as a first detection packet, and taking a full-image block including the target object in the video to be detected as a second detection packet, wherein the first detection packet comprises a plurality of first detection instances, and the second detection packet comprises a plurality of second detection instances; inputting the plurality of first detection examples into a first submodel of the abnormity detection model to obtain a first score set, and inputting the plurality of second detection examples into a second submodel of the abnormity detection model to obtain a second score set; and determining a detection example of the target object in the video to be detected with abnormality according to the first and second score sets. The detection process integrates the scoring values of the two submodels in the abnormal detection model to the same detection example, so that the detection example output by the abnormal detection model is more accurate. In addition, the two submodels in the anomaly detection model respectively perform video anomaly positioning from a time dimension and a space dimension, so that an anomaly detection result comprises information of two dimensions, and the detection result is more accurate.

The above embodiments show the anomaly detection method for the video to be detected, the detection process relates to an anomaly detection model, the learning capability of the anomaly detection model directly affects the anomaly detection effect, and the following describes the training process of the anomaly detection model in detail with reference to the accompanying drawings.

Fig. 6 is a flowchart of a training method for an anomaly detection model according to an embodiment of the present disclosure, and referring to fig. 6, the training method for a model according to the present embodiment includes the following steps:

step 301, obtaining a video sample including a video tag. The video tag is used to indicate whether an anomaly exists in the target object in the video sample.

The embodiment of the application can obtain the video sample comprising the video label through the sample acquisition device. Specifically, the sample acquisition device acquires an original video sample through the image acquisition device, and the video sample including the video label is obtained through manual labeling.

In the embodiment of the application, a video sample is a video clip, and a marker watches the video clip and marks a video label of the video sample. The video labels comprise positive labels and negative labels, the video samples of the positive labels are positive samples, and the video samples of the negative labels are negative samples. Specifically, if the target object in the video segment is abnormal, the video sample is marked as a positive sample, and the target object of at least one image frame in the positive sample is abnormal; if the target object in the video segment is not abnormal, the video sample is marked as a negative sample, and the target objects of all the image frames in the negative sample are not abnormal.

Therefore, the manual labeling process of the embodiment of the application only labels the video sample as a whole, does not label each image frame in the video sample, and is low in labeling workload.

Step 302, preprocessing the video sample to obtain a plurality of first training examples at a pipeline level and a plurality of second training examples at a video level.

The first training example and the second training example both comprise a plurality of consecutive image frames, all image frames in the first training example comprise the target object, and part of the image frames in the second training example comprise the target object.

The first training example belongs to a pipeline-level training example, and the pipeline-level training example refers to a plurality of continuous image frames taking a detection frame corresponding to a target object in a video sample as a training example. The second training instance belongs to a video-level training instance, and the video-level training instance refers to a plurality of continuous image frames taking a full-image of a video sample as a training instance.

The first training example and the second training example of the embodiment of the present application may be from different video samples or from the same video sample, and the embodiment of the present application is not limited in any way. It should be understood that the embodiment of the present application constructs a first training instance for training a first sub-model in an abnormality detection model, and constructs a second training instance for training a second sub-model in the abnormality detection model. The training instances of the first submodel are pipeline-level training instances, and the training instances of the second submodel are video-level training instances. In the initial training process, the training instances of the two submodels may or may not have a correspondence, that is, the initial training processes of the two submodels may be independent of each other.

Step 303, determining a first loss function according to the first set of scores input to the original first submodel by the plurality of first training examples.

In the embodiment of the application, a plurality of first training examples are sequentially input into an original first submodel to obtain a first score set, wherein the first score set comprises scores corresponding to each first training example; a first loss function for the first submodel is determined from the first set of scores.

Step 304, determining a second loss function according to a second set of scores input to the original second submodel by the plurality of second training instances.

In the embodiment of the application, a plurality of second training examples are sequentially input into an original second submodel to obtain a second score set, wherein the second score set comprises scores corresponding to each second training example; a second loss function for the second submodel is determined from the second set of scores.

It should be noted that, in the embodiment of the present application, no limitation is imposed on the execution sequence of step 303 and step 304, and the steps may be executed sequentially according to a preset execution sequence or may be executed simultaneously.

Step 305, determining a total loss function of the anomaly detection model according to the first loss function and the second loss function.

Specifically, the total loss function of the anomaly detection model may be constructed through preset weighted values of the first loss function and the second loss function, and the total loss function of the anomaly detection model may also be constructed through any other combination manner, which does not limit the embodiment of the present application.

And step 306, when the total loss function is converged, combining the currently trained first sub-model and the currently trained second sub-model to obtain an abnormality detection model.

Optionally, in some embodiments, the total loss function is not converged, and at this time, the training of the first sub-model and the second sub-model may be continued by adjusting the learning parameter of the first sub-model, or by adjusting the learning parameter of the second sub-model, or by adjusting the learning parameters of the first sub-model and the second sub-model simultaneously, until the total loss function is converged and the training of the model is stopped.

According to the training method of the anomaly detection model, the video samples including the video labels are obtained and preprocessed, so that first training examples of a plurality of pipeline levels and second training examples of a plurality of video levels are obtained; sequentially inputting the plurality of first training examples to an original first submodel to obtain a first score set, and determining a first loss function according to the first score set; sequentially inputting the plurality of second training examples to an original second submodel to obtain a second score set, and determining a second loss function according to the second score set; and determining a total loss function of the abnormal detection model according to the first loss function and the second loss function, and combining the currently trained first sub-model and the currently trained second sub-model when the total loss function is converged to obtain the trained abnormal detection model. In the model training process, two types of training examples are adopted to respectively train the two sub-models, and loss functions of the two sub-models are integrated to obtain the anomaly detection model with high anomaly detection accuracy.

Fig. 7 is a flowchart of a training method for an anomaly detection model according to an embodiment of the present application, and referring to fig. 7, the training method for a model according to the embodiment includes the following steps:

step 401, obtaining a video sample including a video tag.

Step 401 in the present embodiment is the same as step 301 in the above embodiment, and reference may be specifically made to the above embodiment, which is not described herein again.

For the convenience of understanding, the embodiment of the present application takes the first training example and the second training example from the same video sample as an example, and the training examples for constructing the two categories are described in detail. It should be noted that, in the actual model training process, the first training instance and the second training instance may be from different video samples.

Step 402, performing target detection on each image frame in the video sample, and using an image block of a detection frame corresponding to a target object in the video sample as a first training packet, where the first training packet includes a plurality of first training instances.

In a possible implementation manner of the embodiment of the present application, target detection is performed on each image frame in a video sample, one or more image frames not including a target object are removed, then target detection is performed on a plurality of consecutive image frames including the target object, and a first training packet and a third training packet at a video level corresponding to the first training packet are constructed. The first training packet can be used for inputting a first sub-model of the anomaly detection model to obtain a score set of the first training packet. The third training packet may be used to input a second submodel of the anomaly detection model to obtain a score set of the second training packet. It should be appreciated that the scores of the two submodels for the same training instance may differ, e.g., the score of a first training instance in a first training package may differ from the score of a second training instance in a second training package corresponding to the first training instance.

For example, fig. 8a is a schematic diagram of constructing a first training package according to an embodiment of the present application. Referring to fig. 8a, the video sample includes 100 image frames, wherein the image frames including the target object are the 1 st to 10 th frames and the 50 th to 60 th frames. Firstly, image frames not including target objects are removed to obtain full image images of 1 st to 10 th frames and 50 th to 60 th frames, then target detection is carried out on the full image images to obtain local images of detection frames corresponding to the target objects in the 1 st to 10 th frames and the 50 th to 60 th frames, the local images are used as first training packets, every 10 continuous local images in the first training packets are used as a first training example, and the first training packets in the example comprise 2 first training examples. In addition, every 10 consecutive full image frames in the third training packet serve as a third training instance, and the third training packet in this example includes 2 third training instances.

In a possible implementation manner of the embodiment of the present application, each image frame in the video sample includes a target object, and the first training packet and the third training packet may be constructed directly according to the image frame of the video sample.

And 403, uniformly dividing the video sample according to a preset length to obtain a plurality of second training examples with the same length and video level.

The second training packet constructed in the embodiment of the application can be used for inputting a second sub-model of the anomaly detection model. For example, fig. 8b is a schematic diagram of constructing a second training packet according to an embodiment of the present application. Referring to fig. 8b, the video sample includes 100 image frames, wherein a portion of the image frames includes the target object and a portion of the image frames does not include the target object. When the second training packet is constructed, the image frames which do not comprise the target object do not need to be removed, and the video samples are directly segmented. For example, according to a preset length of 10 frames, a video sample of 100 image frames is uniformly divided into 10 second training examples of the same length in the video level, the second training packet includes 10 second training examples after division, and each second training example includes 10 continuous image frames.

Step 404, determining a first loss function according to a first set of scores obtained by inputting a plurality of first training examples to an original first submodel.

Fig. 9 is a flowchart for constructing a first loss function according to an embodiment of the present application, and referring to fig. 9, step 404 of this embodiment specifically includes the following steps:

step 4041, sequentially inputting the plurality of first training examples to the original first submodel to obtain a first score set, where the first score set includes a score corresponding to each first training example.

Step 4042, determining the first abnormal instance with the highest score and the first normal instance with the highest score in the first score set according to the first score set.

Step 4043, inputting the training example of the video level corresponding to the first abnormal example into the second submodel, and obtaining the score of the training example of the video level corresponding to the first abnormal example.

Step 4044, determining a first loss function of the first submodel according to the score of the training instance at the video level corresponding to the first abnormal instance, the score of the first abnormal instance, and the score of the first normal instance.

The first loss function of the first submodel is described below with reference to a formula.

In the formula (I), the compound is shown in the specification,

a first loss function representing a first submodel;

representing n first training instances

Inputting a first abnormal example with the highest value in the first score set obtained by the first submodel,

a score representing a first instance of the anomaly,

representing n first training instances

Inputting a first normal instance with highest score in a first score set obtained by the first submodel,

a score value representing the first normal instance is shown,

a training instance representing a video level corresponding to the first anomaly instance,

denotes a first differenceAnd inputting the score obtained by the second sub-model into the training example of the video level corresponding to the constant example.

The first loss function of the first submodel of the embodiment of the present application is added with the score value from the second submodel (in formula two)

) The mutual learning between the first sub-model and the second sub-model is realized, and the training effect of the first sub-model is improved.

And 405, determining a second loss function according to a second score set obtained by inputting a plurality of second training examples to a second submodel.

Fig. 10 is a flowchart for constructing a second loss function according to an embodiment of the present application, and referring to fig. 10, step 405 of this embodiment specifically includes the following steps:

4051, sequentially inputting the plurality of second training examples to a second submodel to obtain a second score set, where the second score set includes a score corresponding to each second training example;

step 4052, determining a third normal instance with the highest score value and a second normal instance with the highest score value in the second score value set according to the second score value set;

step 4053, inputting the third abnormal instance into the first submodel to obtain a score of the third abnormal instance in the first submodel;

step 4054, determining a second loss function of the second submodel according to the score of the third abnormal instance in the first submodel, the score of the third abnormal instance and the score of the second normal instance.

The second loss function of the second submodel is described below with reference to equation three.

In the formula (I), the compound is shown in the specification,

represents the secondA second loss function for the submodel;

representing n second training instances

Inputting a third abnormal example with the highest value in the second score set obtained by the second submodel,

a score value representing a third exception instance,

representing n second training instances

Entering a second normal example with the highest score in a second score set obtained by a second submodel,

a score value representing the second normal instance,

the score obtained by inputting the first submodel by the third anomaly instance is shown.

The second loss function of the second submodel of the embodiment of the present application increases the score value from the first submodel (in equation three)

) The mutual learning between the first sub-model and the second sub-model is realized, and the training effect of the second sub-model is improved.

And step 406, determining a total loss function of the anomaly detection model according to the first loss function and the second loss function.

The total loss function for constructing the anomaly detection model includes the following two possible implementations.

In a possible implementation manner, the first loss function and the second loss function are subjected to weighted summation to obtain a third loss function, and the third loss function is used as a total loss function of the anomaly detection model. See the following equation four:

in the formula, L_rankA total loss function representing a third loss function, i.e. an anomaly detection model,

a first loss function, i.e. a loss function of a first sub-model in the anomaly detection model,

a second loss function, i.e. a loss function of a second sub-model in the anomaly detection model,

a weight value representing a first loss function,

a weight value representing a second loss function.

The total loss function constructed by the embodiment can further enhance the mutual learning between the two submodels and improve the anomaly detection effect of the models.

In a possible implementation manner, the first loss function and the second loss function are subjected to weighted summation to obtain a third loss function, and then the total loss function of the anomaly detection model is determined according to the third loss function and a preset cross entropy loss function.

Wherein the cross entropy loss function is determined according to the score of the first abnormal instance, the score of the first normal instance, the score of the third abnormal instance and the score of the second normal instance. The first abnormal instance is an abnormal instance with the highest score obtained by inputting the first submodel in the plurality of first training instances, the first normal instance is a normal instance with the highest score obtained by inputting the first submodel in the plurality of first training instances, the third abnormal instance is an abnormal instance with the highest score obtained by inputting the second submodel in the plurality of second training instances, and the second normal instance is a normal instance with the highest score obtained by inputting the second submodel in the plurality of second training instances.

The third loss function in the present embodiment is described below with reference to equations five and six.

L＝L_rank+L_CEFormula five

Wherein L represents the total loss function of the abnormality detection model, L_rankRepresenting a third loss function, L_CEThe cross entropy loss function is shown, and other reference numerals can be found in the above embodiments.

To further narrow down the scoring difference of the two submodels for the same training instance, so that

And

is as close to 1 as possible, while

And

the score of (2) is as close to 0 as possible, and a cross entropy loss function can be added on the basis of a third loss function, so that the total loss function constructed by the embodiment further improves the anomaly detection effect of the model.

And 407, combining the currently trained first sub-model and the currently trained second sub-model when the total loss function is converged to obtain an abnormality detection model.

According to the training method of the anomaly detection model, a video sample comprising a video label is obtained; performing target detection on each image frame in a video sample, and taking an image block of a detection frame corresponding to a target object in the video sample as a first training packet, wherein the first training packet comprises a plurality of first training examples; uniformly segmenting the video samples according to a preset length to obtain a plurality of second training examples with equal-length video levels; inputting a first training example into an original first sub-model for model training, inputting a second training example into an original second sub-model for model training, wherein the training process of the two sub-models relates to mutual learning between the sub-models, and a first loss function of the first sub-model and a second loss function of the second sub-model are determined through the mutual learning. And when the total loss function of the abnormal detection model is converged, finishing model training, and combining the currently trained first sub-model and the currently trained second sub-model to obtain the trained abnormal detection model. The model training process adopts two types of training examples to respectively train the two submodels, the training process also relates to the mutual learning between the two submodels, and finally obtains the anomaly detection model with higher anomaly detection accuracy rate by analyzing the total loss function of the anomaly detection model.

Fig. 11 is a schematic structural diagram of an abnormality detection apparatus according to an embodiment of the present application, and referring to fig. 11, an abnormality detection apparatus 500 according to the embodiment includes:

an obtaining module 501, configured to obtain a video to be detected, where the video to be detected includes a target object;

a processing module 502, configured to pre-process the video to be detected to obtain a plurality of pipeline-level first detection examples and a plurality of video-level second detection examples, where the first detection examples and the second detection examples each include a plurality of consecutive image frames, the first detection examples correspond to the image frames in the second detection examples one to one, and the size of the image frames in the first detection examples is smaller than the size of the image frames in the second detection examples;

the processing module 502 is further configured to input the plurality of first detection instances and the plurality of second detection instances to an anomaly detection model, so as to obtain a detection instance in which the target object in the video to be detected is anomalous;

the anomaly detection model comprises a first submodel and a second submodel, and the first submodel and the second submodel are obtained by training an initial model based on different training examples.

In a possible implementation, the processing module 502 is specifically configured to:

performing target detection on each image frame in the video to be detected, taking image blocks of a detection frame corresponding to the target object in the video to be detected as a first detection packet, and taking full-image blocks including the target object in the video to be detected as a second detection packet;

wherein the first detection packet includes a plurality of the first detection instances and the second detection packet includes a plurality of the second detection instances.

inputting the plurality of first detection examples to a first submodel of the abnormity detection model to obtain a first score set, wherein the first score set comprises a score corresponding to each first detection example;

inputting the plurality of second detection examples to a second submodel of the anomaly detection model to obtain a second score set, wherein the second score set comprises scores corresponding to each second detection example;

and determining a detection example of the target object in the video to be detected with abnormality according to the first and second score sets.

determining a detection instance with the highest average score according to the first score set and the second score set;

and taking the detection example with the highest average score as the detection example of the abnormality of the target object in the video to be detected.

The anomaly detection apparatus provided in this embodiment may implement the technical solution of the method embodiment shown in fig. 2 or fig. 4, and the implementation principle and the technical effect are similar, which are not described herein again.

Fig. 12 is a schematic structural diagram of a training apparatus for an anomaly detection model according to an embodiment of the present application, and referring to fig. 12, a training apparatus 600 for an anomaly detection model according to the present embodiment includes:

an obtaining module 601, configured to obtain a video sample including a video tag, where the video tag is used to indicate whether a target object in the video sample is abnormal;

a processing module 602, configured to pre-process the video sample to obtain a plurality of pipeline-level first training examples and a plurality of video-level second training examples, where the first training examples and the second training examples both include a plurality of consecutive image frames, all the image frames in the first training example include the target object, and a part of the image frames in the second training example include the target object;

determining a first loss function according to a first score set obtained by inputting a plurality of first training examples to an original first submodel; determining a second loss function according to a second score set obtained by inputting a plurality of second training examples to an original second submodel; determining a total loss function of the anomaly detection model according to the first loss function and the second loss function;

and when the total loss function is converged, combining the currently trained first sub-model and the currently trained second sub-model to obtain the anomaly detection model.

In a possible implementation manner, the processing module 602 is specifically configured to:

and performing target detection on each image frame in the video sample, and taking an image block of a detection frame corresponding to the target object in the video sample as a first training packet, wherein the first training packet comprises a plurality of first training instances.

and uniformly segmenting the video samples according to a preset length to obtain a plurality of second training examples with the same length and video level.

sequentially inputting a plurality of first training examples to an original first submodel to obtain a first score set, wherein the first score set comprises a score corresponding to each first training example;

determining a first abnormal instance with the highest score and a first normal instance with the highest score in the first score set according to the first score set;

inputting the training examples of the video levels corresponding to the first abnormal examples into the second submodel to obtain the scores of the training examples of the video levels corresponding to the first abnormal examples;

and determining the first loss function of the first submodel according to the score of the training example of the video level corresponding to the first abnormal example, the score of the first abnormal example and the score of the first normal example.

sequentially inputting a plurality of second training examples to the second submodel to obtain a second score set, wherein the second score set comprises scores corresponding to each second training example;

determining a third normal example with the highest score value and a second normal example with the highest score value in the second score value set according to the second score value set;

inputting the third abnormal instance to the first sub-model to obtain the score of the third abnormal instance on the first sub-model;

and determining the second loss function of the second submodel according to the score of the third abnormal instance in the first submodel, the score of the third abnormal instance and the score of the second normal instance.

and performing weighted summation on the first loss function and the second loss function to obtain a third loss function, and taking the third loss function as a total loss function of the anomaly detection model.

carrying out weighted summation on the first loss function and the second loss function to obtain a third loss function;

and determining a total loss function of the abnormal detection model according to the third loss function and a preset cross entropy loss function.

In one possible implementation, the cross-entropy loss function is determined according to a score of the first anomaly instance, a score of the first normal instance, a score of the third anomaly instance, and a score of the second normal instance;

the first abnormal instance is an abnormal instance with the highest score obtained by inputting the first submodel in the plurality of first training instances, the first normal instance is a normal instance with the highest score obtained by inputting the first submodel in the plurality of first training instances, the third abnormal instance is an abnormal instance with the highest score obtained by inputting the second submodel in the plurality of second training instances, and the second normal instance is a normal instance with the highest score obtained by inputting the second submodel in the plurality of second training instances.

The training apparatus for an anomaly detection model provided in this embodiment may implement the technical solution of the method embodiment shown in fig. 6 or fig. 7, and the implementation principle and the technical effect are similar, which are not described herein again.

The application also provides an electronic device and a readable storage medium.

According to an embodiment of the present application, there is also provided a computer program product, including a computer program, where the computer program is stored in a readable storage medium, and at least one processor of an electronic device can read the computer program from the readable storage medium, and the at least one processor executes the computer program to make the electronic device execute the technical solutions provided by any of the foregoing embodiments.

Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

Fig. 13 is a schematic structural diagram of an electronic device of an abnormality detection method according to an embodiment of the present application, and referring to fig. 13, the electronic device includes: one or more processors 701, a memory 702, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 13 illustrates an example of a processor 701.

The memory 702 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the anomaly detection method provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the anomaly detection method provided by the present application.

The memory 702, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the anomaly detection method in the embodiments of the present application. The processor 701 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 702, that is, implements the abnormality detection method in the above-described method embodiment.

The memory 702 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device of the abnormality detection method, and the like. Further, the memory 702 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 702 may optionally include memory located remotely from the processor 701, and such remote memory may be connected to the anomaly detection method electronics over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the abnormality detection method may further include: an input device 703 and an output device 704. The processor 701, the memory 702, the input device 703 and the output device 704 may be connected by a bus or other means, and fig. 12 illustrates an example of connection by a bus.

The input device 703 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus of the abnormality detection method, such as an input device of a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 704 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Fig. 14 is a schematic structural diagram of an electronic device of a training method for an anomaly detection model according to an embodiment of the present application, and referring to fig. 14, the electronic device includes: one or more processors 801, memory 802, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 14 illustrates an example of a processor 801.

The memory 802 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the method for training an anomaly detection model provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the method of training an anomaly detection model provided herein.

The memory 802, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the training method of the anomaly detection model in the embodiments of the present application. The processor 801 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 802, that is, implements the training method of the anomaly detection model in the above-described method embodiments.

The memory 802 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device of the training method of the abnormality detection model, and the like. Further, the memory 802 may include high speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 802 may optionally include memory located remotely from the processor 801, which may be connected to the electronics of the training method of the anomaly detection model via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the training method of the abnormality detection model may further include: an input device 803 and an output device 804. The processor 801, the memory 802, the input device 803, and the output device 804 may be connected by a bus or other means, and are exemplified by a bus in fig. 12.

The input device 803 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus of the training method of the abnormality detection model, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 804 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

The embodiment of the present application further provides an anomaly detection method, including the following steps: the method comprises the steps of obtaining a first detection example of a plurality of pipeline levels corresponding to a video to be detected and a second detection example of a plurality of video levels corresponding to the video to be detected, wherein the first detection example and the second detection example both comprise a plurality of continuous image frames, the image frames in the first detection example and the second detection example correspond to each other one by one, and the size of the image frame in the first detection example is smaller than that of the image frame in the second detection example; and inputting the plurality of first detection examples and the plurality of second detection examples into an abnormality detection model to obtain detection examples of the target object in the video to be detected with abnormality. For a specific implementation process of this embodiment, reference may be made to the description of the embodiment shown in fig. 1 or fig. 3, which is not described herein again.

By the technical scheme provided by the embodiment of the application, video abnormity positioning in time dimension and space dimension can be realized, and the accuracy of video abnormity detection is improved.

The embodiment of the present application further provides a training method for an anomaly detection model, where the anomaly detection model includes a first submodel and a second submodel, and the training method includes the following steps: acquiring a plurality of pipeline-level first training examples corresponding to a video sample and a plurality of video-level second training examples corresponding to the video sample, wherein the first training examples and the second training examples both comprise a plurality of continuous image frames, all the image frames in the first training examples comprise target objects, and part of the image frames in the second training examples comprise the target objects; training a first submodel through a plurality of first training examples, and training a second submodel through a plurality of second training examples; determining a total loss function of the anomaly detection model according to a first loss function of the first submodel and a second loss function of the second submodel; and when the total loss function is converged, combining the currently trained first sub-model and the currently trained second sub-model to obtain a trained anomaly detection model. For a specific implementation process of this embodiment, reference may be made to the description of the embodiment shown in fig. 5 or fig. 6, which is not described herein again.

According to the technical scheme provided by the embodiment of the application, the two types of training examples are constructed to respectively train the two sub-models, and the anomaly detection model with high anomaly detection accuracy is finally obtained by analyzing the total loss function of the constructed anomaly detection model.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An anomaly detection method comprising:

acquiring a video to be detected, wherein the video to be detected comprises a target object;

preprocessing the video to be detected to obtain a plurality of pipeline-level first detection examples and a plurality of video-level second detection examples, wherein the first detection examples and the second detection examples both comprise a plurality of continuous image frames, the first detection examples and the image frames in the second detection examples are in one-to-one correspondence, and the size of the image frames in the first detection examples is smaller than that of the image frames in the second detection examples;

inputting the plurality of first detection examples into a first submodel of an abnormity detection model, and inputting the plurality of second detection examples into a second submodel of the abnormity detection model to obtain a detection example of the abnormity of the target object in the video to be detected;

the first sub-model and the second sub-model are obtained by training an initial model based on different training examples.

2. The method according to claim 1, wherein the preprocessing the video to be detected to obtain a plurality of first detection instances at a pipeline level and a plurality of second detection instances at a video level comprises:

3. The method according to claim 1 or 2, wherein the step of inputting the plurality of first detection instances into a first submodel of an anomaly detection model and inputting the plurality of second detection instances into a second submodel of the anomaly detection model to obtain the detection instance of the target object in the video to be detected with the anomaly comprises the steps of:

4. The method according to claim 3, wherein said determining, according to the first and second sets of scores, that the target object in the video to be detected has an abnormal detection instance comprises:

5. A method of training an anomaly detection model, the anomaly detection model comprising a first sub-model and a second sub-model, the method comprising:

acquiring a video sample comprising a video tag, wherein the video tag is used for indicating whether a target object in the video sample is abnormal or not;

preprocessing the video sample to obtain a plurality of pipeline-level first training examples and a plurality of video-level second training examples, wherein the first training examples and the second training examples both comprise a plurality of continuous image frames, all the image frames in the first training examples comprise the target object, and part of the image frames in the second training examples comprise the target object;

6. The method of claim 5, wherein the pre-processing the video sample to obtain a first training instance at a plurality of pipeline levels comprises:

7. The method of claim 5, wherein the pre-processing the video sample to obtain a second training instance at a plurality of video levels comprises:

8. The method of claim 5, wherein determining a first loss function from a first set of scores obtained from a plurality of the first training instances input to an original first submodel comprises:

9. The method of claim 5, wherein determining a second loss function from a second set of scores obtained from the plurality of second training instances input to the second submodel comprises:

10. The method of claim 5, wherein said determining a total loss function of the anomaly detection model from the first loss function and the second loss function comprises:

11. The method of claim 5, wherein said determining a total loss function of the anomaly detection model from the first loss function and the second loss function comprises:

12. The method of claim 11, wherein the cross-entropy loss function is determined from a score of a first anomaly instance, a score of a first normal instance, a score of a third anomaly instance, and a score of a second normal instance;

13. An abnormality detection device comprising:

the acquisition module is used for acquiring a video to be detected, wherein the video to be detected comprises a target object;

the processing module is used for preprocessing the video to be detected to obtain a plurality of pipeline-level first detection examples and a plurality of video-level second detection examples, wherein the first detection examples and the second detection examples respectively comprise a plurality of continuous image frames, the first detection examples and the image frames in the second detection examples are in one-to-one correspondence, and the size of the image frames in the first detection examples is smaller than that of the image frames in the second detection examples;

the processing module is further configured to input the plurality of first detection instances into a first submodel of an anomaly detection model, and input the plurality of second detection instances into a second submodel of the anomaly detection model, so as to obtain a detection instance in which the target object in the video to be detected is abnormal;

14. The apparatus according to claim 13, wherein the processing module is specifically configured to:

15. The apparatus of claim 13 or 14,

the processing module is specifically configured to:

16. The apparatus according to claim 15, wherein the processing module is specifically configured to:

17. An apparatus for training an anomaly detection model, the anomaly detection model comprising a first sub-model and a second sub-model, the apparatus comprising:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a video sample comprising a video tag, and the video tag is used for indicating whether a target object in the video sample is abnormal or not;

the processing module is used for preprocessing the video sample to obtain a plurality of pipeline-level first training examples and a plurality of video-level second training examples, wherein the first training examples and the second training examples both comprise a plurality of continuous image frames, all the image frames in the first training examples comprise the target object, and part of the image frames in the second training examples comprise the target object;

18. The apparatus according to claim 17, wherein the processing module is specifically configured to:

19. The apparatus according to claim 17, wherein the processing module is specifically configured to:

20. The apparatus according to claim 17, wherein the processing module is specifically configured to:

21. The apparatus according to claim 17, wherein the processing module is specifically configured to:

22. The apparatus according to claim 17, wherein the processing module is specifically configured to:

23. The apparatus according to claim 17, wherein the processing module is specifically configured to:

24. The apparatus of claim 23, wherein the cross-entropy loss function is determined according to a score of a first anomaly instance, a score of a first normal instance, a score of a third anomaly instance, and a score of a second normal instance;

25. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the electronic device to perform the method of any of claims 1-4 or the method of any of claims 5-12.

26. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-4 or the method of any one of claims 5-12.

27. An anomaly detection method comprising:

acquiring a first detection example of a plurality of pipeline levels corresponding to a video to be detected and a second detection example of a plurality of video levels corresponding to the video to be detected, wherein the first detection example and the second detection example both comprise a plurality of continuous image frames, the first detection example corresponds to the image frames in the second detection example one by one, and the size of the image frames in the first detection example is smaller than that of the image frames in the second detection example;

and inputting the plurality of first detection examples and the plurality of second detection examples into an anomaly detection model to obtain a detection example of the anomaly of the target object in the video to be detected.

28. A method of training an anomaly detection model, the anomaly detection model comprising a first sub-model and a second sub-model, the method comprising:

acquiring a plurality of pipeline-level first training examples corresponding to a video sample and a plurality of video-level second training examples corresponding to the video sample, wherein the first training examples and the second training examples both comprise a plurality of continuous image frames, all the image frames in the first training examples comprise a target object, and part of the image frames in the second training examples comprise the target object;

training a first submodel through a plurality of the first training examples, and training a second submodel through a plurality of the second training examples;

determining a total loss function of the abnormality detection model according to a first loss function of the first submodel and a second loss function of the second submodel; and when the total loss function is converged, combining the currently trained first sub-model and the currently trained second sub-model to obtain a trained anomaly detection model.