CN111666983A

CN111666983A - Method and device for marking abnormal behaviors

Info

Publication number: CN111666983A
Application number: CN202010427324.9A
Authority: CN
Inventors: 莫益军; 刘金阳
Original assignee: Huazhong University of Science and Technology; Ezhou Institute of Industrial Technology Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology; Ezhou Institute of Industrial Technology Huazhong University of Science and Technology
Priority date: 2020-05-19
Filing date: 2020-05-19
Publication date: 2020-09-15

Abstract

The invention relates to the technical field of behavior analysis, in particular to a method and a device for marking abnormal behaviors. The method comprises the following steps: pre-training a neural network based on a current abnormal behavior data set to obtain a first neural network model; copying all network architectures and model parameters except an output layer in the first neural network model to create a second neural network model; adding output layers of which the tensor size corresponds to the number of abnormal behavior detection categories in the second neural network model; training a second neural network model added with an output layer according to a target data set obtained by the PASCAL VOC marking mode or COCO marking mode to obtain an abnormal behavior marking model; inputting a data set to be labeled into an abnormal behavior labeling model, labeling each data set to be labeled in the data set to be labeled, and obtaining a labeled data set; judging whether the marked data set is marked correctly; inputting the data with the wrong label into an abnormal behavior labeling model for re-labeling.

Description

Method and device for marking abnormal behaviors

Technical Field

The invention relates to the technical field of behavior analysis, in particular to a method and a device for marking abnormal behaviors.

Background

Abnormal behavior detection is an important branch of human behavior recognition and is an important research task in the field of computer vision. The abnormal behavior detection mainly refers to the classification and detection of specific behaviors, and the specific behaviors can be fighting events, stepping events and the like in public places. The abnormal behavior detection process mainly comprises two parts of target feature extraction and feature classification. Target feature extraction refers to extracting features capable of characterizing abnormal behavior actions from video data. The feature classification is to classify the extracted features, for example, by using a Support Vector Machine (SVM).

With the development of deep learning in the field of computer vision, the research process in the field can be greatly promoted by applying the deep learning to the abnormal behavior detection, so that the accuracy and the real-time performance of target feature extraction and feature classification are greatly improved.

Currently, abnormal behavior data available for deep learning includes the following three categories:

the first category is data sets studied for human abnormalities. For example, the USCD data set is used for researching abnormal behaviors of pedestrians in a road scene, and comprises 98 videos and 5 abnormal behaviors. The CUHK data set is used for researching abnormal behaviors of traffic and pedestrians in a campus crowded scene, and the two sub data sets respectively comprise videos of 90 minutes and 30 minutes. The VIRAT data set is the research on the activities of two objects, namely a human and a vehicle, and comprises 12 different types of events, wherein the events in different scenes are acquired through 11 cameras, and about 8.5 hours of videos are acquired.

The second category is data sets where humans have destructive behavior in public places. For example, the university of Minnesota collects video data in 7 scenes, studies the actions of human discarding objects, crossing restricted areas, and the like, and prepares a data set.

The third category is data sets of violent behavior of humans in public places. For example, the vif (vitality flow) database gathered 246 videos of violent behavior from YouTube. The UCF-Crime Dataset data set researches 13 violent behaviors in the real world, and relevant videos are collected from the network and labeled.

In the prior art, when an abnormal behavior data set is labeled, a manual labeling mode is usually adopted, so that not only a great deal of time is spent, but also a great deal of manpower is consumed.

Disclosure of Invention

In view of the above, the present invention has been made to provide a method and apparatus for annotating abnormal behavior that overcomes or at least partially solves the above problems.

According to a first aspect of the present invention, there is provided a method of labelling abnormal behaviour, the method comprising:

pre-training a neural network based on a current abnormal behavior data set to obtain a first neural network model;

copying all network architectures and model parameters except an output layer in the first neural network model to create a second neural network model;

adding an output layer of tensor size corresponding to the number of abnormal behavior detection categories in the second neural network model based on the number of abnormal behavior detection categories;

training the second neural network model added with the output layer according to a target data set obtained by labeling in a PASCAL VOC labeling mode or a COCO labeling mode until convergence, and obtaining an abnormal behavior labeling model;

inputting a data set to be labeled into the abnormal behavior labeling model, so that the abnormal behavior labeling model labels each data to be labeled in the data set to be labeled, and a labeled data set is obtained;

judging whether the labeled data set is correctly labeled;

and if the labeling is wrong, inputting the data with the wrong labeling into the abnormal behavior labeling model for re-labeling.

Preferably, the target data set is obtained by a PASCAL VOC labeling method, including:

labeling at least one target picture to obtain a labeled file corresponding to the target picture to form the target data set;

wherein, the labeling of the at least one target picture includes at least one of the following labeling conditions: the behavior type is not marked, the case is marked as bad, and the case is marked by using a marking frame;

wherein the condition of not marking the behavior type comprises the following steps: uncertain target behaviors, target behaviors with the size smaller than a preset size and target behaviors with the shielding range larger than a preset shielding range;

wherein, the case labeled bad includes: the target pictures with the number of the target behaviors larger than the preset number, the target pictures with the image quality low enough to be incapable of identifying the target behaviors and the target pictures containing a plurality of sub-images are obtained;

wherein, the condition of marking by using the marking box comprises the following steps: marking a visible area of the target behavior with a marking frame; marking all visible pixels by using a marking frame; more than 15% of the target behavior is occluded and outside the label box, labeled "Truncated"; marking the shelters attached to the target behaviors and the target behaviors together by using a marking frame; marking the target behavior visible through the glass by using a marking frame; marking the target behavior appearing in the mirror with a marking frame;

wherein the markup file comprises the target picture and an extensible markup language subfile;

the extensible markup language subfile comprises a path of the target picture, the length and the width of the target picture, a behavior category marked by the marking frame and the position of the marking frame in the target picture.

Preferably, the obtaining of the target data set by a COCO labeling manner includes:

labeling each target picture in a first data set corresponding to a target detection task or a semantic segmentation task to obtain a first labeling file corresponding to the target picture to form the first target data set; wherein the first annotation file comprises the following data structure: the method comprises the steps of obtaining a first annotation and a first category, wherein the first annotation comprises a target behavior ID, a picture to which the target behavior belongs, a category of the target behavior and a position of an annotation box in the target picture, and the first category comprises all behavior categories for which annotation is performed and IDs of all behavior categories;

labeling each target picture in a second data set corresponding to the key point detection task to obtain a second labeling file corresponding to the target picture to form a second target data set; wherein the second annotation file comprises the following data structure: the second annotation comprises key point positions and key point numbers of target behaviors, categories of the target behaviors and positions of the annotation frames in the target pictures, and the second category comprises all key points for which annotations are aimed;

labeling each target picture in a third data set corresponding to the scene segmentation task to obtain a third labeling file corresponding to the target picture to form a third target data set; wherein the third markup file comprises the following data structure: the third annotation comprises an ID of the target picture and a file name of the target picture, the segment information comprises pixel segment IDs, IDs of target behaviors and positions of the labeling boxes in the target picture, and the third category comprises all pixel segment IDs of labeling targets, all behavior categories of labeling targets and pixel segment colors;

labeling each target picture in a fourth data set corresponding to the picture subtitle task to obtain a fourth labeling file corresponding to the target picture to form a fourth target data set; wherein the fourth markup file comprises the following data structure: a fourth annotation, the fourth annotation comprising a target behavior ID, an ID of the target picture, and a subtitle.

Preferably, before the inputting the data set to be labeled into the abnormal behavior labeling model, the method further includes:

acquiring and obtaining the data set to be labeled based on different acquisition time periods, different acquisition angles, different acquisition distances, different acquisition weathers and different acquisition place types;

wherein the data source of the data set to be labeled comprises at least one of the following data sources:

taking a video stream in the monitoring video as the data source;

taking a video stream pre-recorded by a user as the data source;

taking pictures or videos in a real scene in a network as the data source;

using pictures or videos in online videos, movies, television shows and news as the data source;

and integrating the current abnormal behavior data set to obtain a picture or a video as the data source.

Preferably, before adding an output layer having a tensor size corresponding to the number of abnormal behavior detection categories to the second neural network model based on the number of abnormal behavior detection categories, the method further includes:

classifying the abnormal behaviors;

wherein the classifying the abnormal behavior comprises:

S＝{s₁、s₂、s₃}；

wherein S is the abnormal behavior, S₁For violent abnormal behavior, s₂For destructive behavior in public places, s₃Abnormal behavior for a population;

wherein s is₁＝{S_normal，S_dead}；

S_normalFor general violent behavior, S_normal＝{s_n1、s_n2、s_n3、s_n4、s_n5、s_n6}，s_n1For boxing, s_n2For kicking, s_n3Is used for nipping a neck or s_n4Is pulling hair, s_n5Is flabellum light, s_n6To throw things;

S_deadfor fatal violent behavior, S_dead＝{s_d1、s_d2、s_d3}，s_d1For poking a knife s_d2Is burn, s_d3Shooting;

wherein s is₂＝{sp₁，sp₂，sp₃，sp₄}；

sp₁For smoking, sp₂To spit phlegm, sp₃Throw away garbage, sp₄Greening trampling;

wherein s is₃＝{S_violence，S_non-violence}；

S_violenceFor violent abnormal behavior, S_violence＝{sv₁，sv₂，sv₃}，sv₁Is a group, sv₂For stepping on, sv₃Is messy;

S_non-violenceabnormal behaviour of a non-violent nature, S_non-violence＝{snv₁，snv₂，snv₃}，snv₁For multiple persons gathering and running in the same direction without abnormal behavior, snv₂For multiple persons gathering to the same center point without abnormal behavior, snv₃The multiple persons run from the same central point to multiple directions without abnormal behaviors.

Preferably, after the obtaining the labeled data set, the method further comprises:

and performing benchmark test and data set cross validation on the labeled data set.

Preferably, the benchmark testing is performed on the labeled data set, and includes:

applying the labeled data set to a target deep learning algorithm to obtain an average recall ratio, an average precision ratio and an intersection ratio;

and evaluating the labeled data set based on the average recall ratio, the average precision ratio and the intersection ratio.

Preferably, the performing the data set cross validation on the labeled data set includes:

randomly dividing the labeled data set into a first training set and a first testing set, and randomly dividing the current abnormal behavior data set into a second training set and a second testing set;

training on a first training set by using a target deep learning algorithm to obtain a first model, and training on a second training set by using the target deep learning algorithm to obtain a second model;

respectively testing the first model and the second model on the first test set, and calculating the average precision ratio of each abnormal behavior category to obtain a first calculation result;

respectively testing the first model and the second model on the second test set, and calculating the average precision ratio of each abnormal behavior category to obtain a second calculation result;

and evaluating the labeled data set and the current abnormal behavior data set based on the first calculation result and the second calculation result.

According to a second aspect of the invention, there is provided a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method steps according to the first aspect.

According to a third aspect of the present invention, there is provided a computer device comprising a memory including a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method steps as described in the first aspect when executing the program.

The method for labeling the abnormal behavior comprises the steps of firstly, pre-training a neural network based on a current abnormal behavior data set to obtain a first neural network model. Then, all network architectures and model parameters except the output layer in the first neural network model are copied to create a second neural network model. Then, based on the number of abnormal behavior detection categories, an output layer whose tensor size corresponds to the number of abnormal behavior detection categories is added in the second neural network model. And training the second neural network model added with the output layer according to the target data set obtained by the PASCAL VOC marking mode or COCO marking mode until convergence, and obtaining an abnormal behavior marking model. And then, inputting the data set to be labeled into an abnormal behavior labeling model, so that the abnormal behavior labeling model labels each data to be labeled in the data set to be labeled, and obtaining a labeled data set. And judging whether the marked data set is marked correctly. And if the labeling is wrong, inputting the data with the labeling error into the abnormal behavior labeling model for re-labeling. According to the invention, the effect of automatically labeling the abnormal behavior data is realized through the process, the labeling time is shortened, and the manpower is saved. In addition, the invention not only simplifies the training process of the training label model, but also shortens the training time by combining with the transfer learning.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a flow chart illustrating a method for labeling abnormal behavior according to an embodiment of the present invention.

Fig. 2 is a schematic diagram showing a process of transfer learning in the embodiment of the present invention.

Fig. 3 is a schematic diagram showing the behavior type of violent abnormal behavior in the embodiment of the invention.

FIG. 4 is a schematic diagram showing the type of the public place destructive behavior inclusion behavior in the embodiment of the invention.

Fig. 5 is a schematic diagram showing the behavior type of the crowd abnormal behavior inclusion in the embodiment of the invention.

FIG. 6 shows cross-validation results for an example of an embodiment of the invention.

Fig. 7 shows a block diagram of a computer device in an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

A first embodiment of the present invention provides a method for labeling abnormal behavior, as shown in fig. 1, the method includes:

step 101: and pre-training a neural network based on the current abnormal behavior data set to obtain a first neural network model.

Step 102: and copying all network architectures and model parameters except the output layer in the first neural network model to create a second neural network model.

Step 103: adding an output layer having a tensor size corresponding to the number of abnormal behavior detection categories in the second neural network model based on the number of abnormal behavior detection categories.

Step 104: and training the second neural network model added with the output layer according to a target data set obtained by the PASCAL VOC labeling mode or COCO labeling mode until convergence, and obtaining an abnormal behavior labeling model.

Step 105: and inputting the data set to be labeled into an abnormal behavior labeling model, so that the abnormal behavior labeling model labels each data to be labeled in the data set to be labeled, and obtaining a labeled data set.

Step 106: and judging whether the marked data set is marked correctly.

Step 107: and if the labeling is wrong, inputting the data with the wrong labeling into the abnormal behavior labeling model for re-labeling.

For step 101, the current abnormal behavior data set is an existing abnormal behavior data set. The abnormal behavior data set is a labeled data set, and meanwhile, the abnormal behavior data set can be an existing large abnormal behavior data set, such as an ImageNet large data set. The neural network targeted by the pre-training may be an SSD network. The first neural network model can be obtained by pre-training the neural network on the current abnormal behavior data set.

Next, in step 102, a second neural network model is created. Specifically, a second neural network model is obtained by copying all network architectures and model parameters in the first neural network model except for the output layer. The second neural network model and the first neural network model are identical except for the difference in output layers.

Further, in step 103, the abnormal behavior detection category to be labeled is first determined, and the abnormal behavior detection category to be labeled may also be referred to as a target abnormal behavior detection category. Further, the number of abnormal behavior detection categories can be obtained. Then, an output layer is added to the second neural network model, the output layer being characterized by a tensor size corresponding to the number of abnormal behavior detection classes. The tensor size of the output layer has the general formula: (C +4+1) m n k, C is the number of abnormal behavior detection categories, 4 is the coordinates x, y, width and height of the label box (bbox), 1 is the confidence (i.e. the category probability), m n is the size of feature map, k is the number of default boxes.

For example, if it is currently necessary to train a model for marking 6 target abnormal behavior detection types of boxing, kicking, pinching, fanning, and throwing something. Then, an output layer whose tensor size corresponds to 6 is added to the second neural network model, specifically: the tensor size of the output layer of the second neural network model is as follows: (6+4+1) × m × n × k, where 6 is the number of abnormal behavior detection categories.

Further, for each output layer, the model parameters of the output layer are initialized randomly.

Further, in step 104, the second neural network model with the added output layer is trained according to the target data set. The target data set belongs to a homemade data set. In the present application, the target data set is obtained by labeling in a PASCAL VOC labeling manner or a COCO labeling manner. The following describes in detail how to obtain the target data set by the PASCAL VOC labeling manner and how to obtain the target data set by the COCO labeling manner:

it should be noted that the labeling manner required by different deep learning algorithms may be different. The COCO labeling mode corresponds to five task types, namely a target detection task, a semantic segmentation task, a key point detection task, a scene segmentation task and a picture subtitle task. And marking the data of the five task types by adopting a COCO marking mode, and marking the data of other task types by adopting a PASCAL VOC marking mode.

Wherein, annotate the target data set through PASCAL VOC mark mode to data, include: and labeling at least one target picture to obtain a labeled file corresponding to the target picture to form a target data set.

Specifically, a large number of target pictures are acquired first. And then, labeling each target picture respectively. After labeling, a target picture is correspondingly generated into a label file, and finally, a target data set is formed by all the label files. The markup file comprises a target picture and an extensible markup language subfile corresponding to the target picture.

Further, how to label a target picture includes at least one of the following labeling cases: the case of not labeling the behavior type, the case of labeling bad, and the case of labeling with a labeling box. The condition of not marking the behavior type means that the behavior type of the target in the target picture is not marked, the condition of marking the target picture as bad means that the target picture is marked as bad, and the marking by using the marking frame means that the behavior type of the target in the target picture is marked by using the marking frame.

The case where the behavior type is not labeled includes the following cases:

a) for uncertain target behavior. And if the target behavior in the target picture is not determined, the behavior type of the target behavior is not marked for the target behavior.

b) A target behavior with a size smaller than a preset size. The preset size is set by the user, typically bounded by whether it is visible to the naked eye. And when the size of the target behavior in the target picture is smaller than the preset size, the target behavior is very small, and in this case, the behavior category of the target behavior is not labeled.

c) And the shielding range is larger than the preset shielding range. The preset occlusion range is set by a user, and in general, the preset occlusion range may be 80%. If the target behavior in the target picture is blocked by more than 80%, only less than 20% of the target behavior can be seen, in this case, the behavior category of the target behavior is not labeled.

The case labeled bad includes the following cases:

a) and the number of the contained target behaviors is larger than the preset number of target pictures. The preset number is set by a user, for example, the preset number may be set to 100, and if the number of target behaviors contained in the target picture is greater than 100, it indicates that too many target behaviors are contained in the target picture. In this case, the target picture is labeled bad.

b) The image quality is low enough to identify the target picture of the target behavior. Some target pictures have poor quality, target behaviors cannot be identified, and the target pictures are marked as bad.

c) A target picture comprising a plurality of sub-pictures. If multiple images are contained in the target picture, such target picture is labeled bad.

The case of labeling with the labeling box includes the following cases:

a) the visible region of the target behavior is marked with a marker box. Specifically, if the target behavior includes a visible region and an invisible region, the visible region of the target behavior is labeled by using the labeling frame, and the invisible region is not labeled. For example, if a person blocks a leg with an object, the part above the leg is marked with a marking box and the blocked leg is not marked.

b) All visible pixels are labeled with a labeling box. Specifically, the label box should contain all visible pixels of the target behavior, unless the label box must be too large to include other pixels, which range from less than 5%. For example, in annotating a character, if the head of the character carries a Christmas cap, i.e., other pixels, in which case other pixels must be included, the annotation box can include a Christmas cap.

c) More than 15% of the target behavior is occluded and outside the labeled box, labeled as Truncated. Specifically, if more than 15% of the target behavior in the target picture is occluded and located outside the labeling box, the target picture is labeled as Truncated. In addition, if the occlusion region of the target behavior is located in the labeling box, the occlusion region will not be labeled as Truncated.

d) And marking the obstruction attached to the target behavior together with the target by using the marking frame. For example, clothing and mud belong to obstructions attached to the target behavior, and these obstructions are labeled in the labeling box together with the target behavior.

e) And marking the target behavior visible through the glass by using the marking box. Specifically, the target behavior visible through the glass will be labeled with a labeling box.

f) The target behavior appearing in the mirror is marked with a marking box. In particular, objects presented in the mirror will be marked with a marking box by the present application.

g) If the target behaviors in pictures such as posters, sign diagrams and the like have reality, the target behaviors can be marked.

h) Cartoon pictures and hand-drawn pictures can not be labeled.

Furthermore, the target picture in the markup file is completely the same as the target picture before the markup, and the markup information of the target picture is stored in the extensible markup language subfile, so that the markup condition of the target picture can be known through the markup information in the extensible markup language subfile, and the target picture cannot be damaged.

Further, the extensible markup language subfile includes a path of the target picture, a length and a width of the target picture, a behavior category labeled by the label box, and a target of the label boxThe position in the picture is marked. Note that the position of the mark frame in the target picture is a coordinate position, that is, (X)_min，Y_min，X_max，Y_max) In fact, the target picture does not have the label box, but the label position of the label box is embodied through the information in the extensible label language subfile. The behavior category marked by the marking box refers to the type of abnormal behavior, such as boxing, kicking and the like.

The method for obtaining the target data set by labeling data in a COCO labeling mode comprises the following steps:

a) labeling each target picture in a first data set corresponding to a target detection task or a semantic segmentation task to obtain a first labeling file corresponding to the target picture to form a first target data set; wherein, the first annotation file comprises the following data structure: the method comprises a first annotation and a first category, wherein the first annotation comprises a target behavior ID, a picture to which the target behavior belongs, a category of the target behavior and a position of an annotation box in the target picture, and the first category comprises all behavior categories to which annotations aim at and IDs of all behavior categories.

Specifically, the labeling process corresponding to the target detection task and the semantic segmentation task is the same. For the target detection task or the semantic segmentation task, the method has a corresponding deep learning algorithm, and the algorithm is used for executing the target detection task or the semantic segmentation task. The data set used for learning the algorithm is the first data set corresponding to the target detection task or the semantic segmentation task in the present application. The first data set comprises at least one target picture. The target picture may be a non-annotated picture. Then, labeling each target picture in the first data set respectively. After labeling, a target picture correspondingly generates a first label file, and finally, the first target data set is formed by all the first label files.

b) And labeling each target picture in the second data set corresponding to the key point detection task to obtain a second labeling file corresponding to the target picture, and forming a second target data set. Wherein the second markup file comprises the following data structure: the second annotation comprises the key point position and the key point number of the target behavior, the category of the target behavior and the position of the annotation frame in the target picture, and the second category comprises all the key points targeted by the annotation.

In particular, for the key point detection task, there is a corresponding deep learning algorithm for performing the key point detection task. And the data set used for learning by the algorithm is the second data set corresponding to the key point detection task in this application. The second data set comprises at least one target picture. The target picture may be a non-annotated picture. Then, labeling each target picture in the second data set respectively. After labeling, a target picture correspondingly generates a second label file, and finally, a second target data set is formed by all the second label files.

c) Labeling each target picture in a third data set corresponding to the scene segmentation task to obtain a third labeling file corresponding to the target picture to form a third target data set; wherein the third markup file comprises the following data structure: the third annotation comprises an ID of the target picture and a file name of the target picture, the segment information comprises a pixel segment ID, an ID of the target behavior and a position of a labeling frame in the target picture, and the third category comprises all pixel segment IDs targeted by labeling, all behavior categories targeted by labeling and pixel segment colors.

In particular, the scene segmentation task has a corresponding deep learning algorithm for performing the scene segmentation task. And the data set used for learning by the algorithm is the third data set corresponding to the scene segmentation task in this application. The third data set comprises at least one target picture. The target picture may be a non-annotated picture. Then, labeling each target picture in the third data set respectively. After labeling, a target picture correspondingly generates a third label file, and finally, a third target data set is formed by all the third label files.

d) Labeling each target picture in a fourth data set corresponding to the picture subtitle task to obtain a fourth labeling file corresponding to the target picture to form a fourth target data set; wherein the fourth markup file comprises the following data structure: and the fourth annotation comprises a target behavior ID, an ID of a target picture and a subtitle.

Specifically, for the picture subtitle task, there is a corresponding deep learning algorithm for performing the picture subtitle task. And the data set used for learning by the algorithm is the fourth data set corresponding to the picture subtitle task in the present application. The fourth data set comprises at least one target picture. The target picture may be a non-annotated picture. Then, labeling each target picture in the fourth data set respectively. After labeling, a target picture correspondingly generates a fourth label file, and finally, a fourth target data set is formed by all the fourth label files.

In the present application, the annotation is annotation, the category is categories, and the segment information is segment _ info.

Note that, in the present application, the markup file is saved in JSON format.

Further, in step 104, after the target data set is obtained, the second neural network model added with the output layer is trained by using the target data set until the second neural network model converges, and an abnormal behavior labeling model is obtained. Specifically, since the output layer in the second neural network model is not trained, training needs to be started from the beginning, and the other parts of the second neural network model except the output layer are pre-trained, so that fine tuning only needs to be updated through training.

In the present application, a process of transfer learning is realized through steps 101 to 104, as shown in fig. 2.

Further, after obtaining the abnormal behavior tagging model, step 105 is executed. And inputting the data set to be labeled into the abnormal behavior labeling model, and labeling each data in the data set to be labeled by using the abnormal behavior labeling model.

The method for acquiring the data to be annotated comprises the following steps:

and acquiring a data set to be marked based on different acquisition time periods, different acquisition angles, different acquisition distances, different acquisition weathers and different acquisition place types.

Wherein, different acquisition time periods include:

T＝{t₁，t₂，t₃}

t is the acquisition time period, T₁In the morning, t₂In the afternoon, t₃At night. Of course, other time periods may also be included.

Wherein, different collection angles include:

Aagle＝{a₁，a₂，a₃，a₄，a₅}

aagle is the collection angle, a₁In a top view, a₂To look at a₃Is a front face, a₄Is a side face a₅Is the back side.

Of course, other angles may be included.

Wherein, different acquisition distances include:

F＝{f₁＝x，f₂＝x+e，f₃＝x+2e，f₄＝x+3e}

f is the acquisition distance, x is the initial distance from the target, and e is the distance increment. Of course, other distances may also be included.

Wherein, different collection weather includes:

W＝{w₁，w₂，w₃，w₄}

w is weather acquisition, W₁In a sunny day, w₂In rainy days, w₃Is snow day, w₄It is a cloudy day. Of course, other weather conditions may also be included.

Wherein, different collection place types include:

D＝{d₁，d₂，d₃，d₄，d₅}

d is weather collection, D₁Is a park, d₂To school, d₃Is a cell, d₄As traffic stations, d₅Is a commercial city.

The data source of the data set to be labeled comprises at least one of the following data sources:

taking a video stream in the monitoring video as a data source;

taking a video stream pre-recorded by a user as a data source;

taking pictures or videos in a real scene in a network as a data source;

taking pictures or videos in online videos, movies, television shows and news as data sources;

and integrating the current abnormal behavior data set to obtain a picture or a video as a data source.

Wherein, prior to step 103, the method further comprises:

classifying the abnormal behaviors;

wherein classifying the abnormal behavior comprises:

S＝{s₁、s₂、s₃}；

wherein S is abnormal behavior, S₁For violent abnormal behavior, s₂For destructive behavior in public places, s₃Abnormal behavior for a population;

wherein, as shown in FIG. 3, s₁＝{S_normal，S_dead}；

wherein, as shown in FIG. 4, s₂＝{sp₁，sp₂，sp₃，sp₄}；

wherein, as shown in FIG. 5, s₃＝{S_violence，S_non-violence}；

S_violenceFor violent abnormal behavior, S_violence＝{sv₁，sv₂，sv₃}；

sv₁Is a group, sv₂For stepping on, sv₃Is messy;

S_non-violenceabnormal behaviour of a non-violent nature, S_non-violence＝{snv₁，snv₂，snv₃}；

snv₁For unidirectional running of a group of people, i.e. multiple people gathering and running in the same direction without abnormal behavior, snv₂Snv for crowds, i.e. multiple people gathering to the same center without abnormal behavior, among each other₃The people are running in four directions, i.e. the people run from the same central point to multiple directions without abnormal behaviors.

Further, the position of the abnormal behavior and the category of the abnormal behavior can be marked in step 105, and the position of the abnormal behavior and the category of the abnormal behavior are used as the marking result. After the labeled data set is obtained, the labeled data set is stored in a memcached cache, the picture absolute path is taken as a key value, and the labeling result is taken as a value. And after the storage is finished, reading the labeling result from the memcached cache and drawing the labeling result into corresponding data.

Further, in step 106, it is determined whether the labeled data set is correctly labeled. If all the annotation data is correct, the annotation is terminated. If the labeled data set has data labeling errors, the data with the labeling errors are input into the abnormal behavior labeling model again to be re-labeled.

The semi-automatic iterative labeling is realized through the process.

After step 105, the application also evaluates the annotated data set. Specifically, the evaluation process includes performing benchmark testing on the labeled data set, and performing data set cross validation on the labeled data set based on the current abnormal behavior data set.

For the benchmark test, the method comprises the following steps:

and applying the labeled data set to a target deep learning algorithm to obtain an average recall ratio, an average precision ratio and an intersection ratio. And evaluating the labeled data set based on the average recall ratio, the average precision ratio and the cross-over ratio. It should be noted that, in the present application, evaluating the labeled data set refers to evaluating the accuracy of the target deep learning algorithm with respect to the labeled data set, that is, if the accuracy of the target deep learning algorithm with respect to the labeled data set is poor, it indicates that the target deep learning algorithm needs to be adjusted.

Specifically, Average call (AR) is an Average recall ratio, and is calculated as follows:

n is the abnormal behavior type to be detected, Recall is the Recall ratio, and the index is calculated in the following way:

TP refers to positive samples predicted to be positive by the model, and FN refers to negative samples predicted to be positive by the model.

Further, Average Precision (AP) refers to the Average precision, and is calculated as follows:

precision refers to Precision, and the index is calculated as follows:

FP refers to the positive sample that is predicted to be negative by the model.

Further, IoU is an intersection ratio, and the calculation formula is as follows:

the Overlap is the intersection of the prediction frame and the labeling frame, the Union is the Union of the prediction frame and the labeling frame, and the Average precision values of IoU-0.5 and IoU-0.7, namely the AP can be calculated through the index_0.5And AP_0.7。

For data set cross validation, comprising:

s11: and randomly dividing the labeled data set into a first training set and a first testing set, and randomly dividing the current abnormal behavior data set into a second training set and a second testing set.

S12: and training on the first training set by using a target deep learning algorithm to obtain a first model, and training on the second training set by using the target deep learning algorithm to obtain a second model.

S13: and respectively testing the first model and the second model on the first test set, and calculating the average precision ratio of each abnormal behavior category to obtain a first calculation result.

S14: and respectively testing the first model and the second model on a second test set, and calculating the average precision ratio of each abnormal behavior category to obtain a second calculation result.

S15: and evaluating the labeled data set and the current abnormal behavior data set based on the first calculation result and the second calculation result.

Specifically, in the cross validation process, the labeled data set is Dataset a, and the current abnormal behavior data set is Dataset B. Selected target deep learning algorithm f (x), such as YOLOv2 algorithm. Further, F (x) training on Dataset A to obtain M_AF (x) training on Dataset B to obtain M_B。M_AAnd M_BRespectively testing on Dataset A to obtain a first calculation result, M_AAnd M_BRespectively on Dataset BAnd testing to obtain a second calculation result, as shown in fig. 6. And finally, evaluating the labeled data set and the current abnormal behavior data set according to the first calculation result and the second calculation result. If YOLOv2-A tests AP value higher on Dataset A than on Dataset B, and YOLOv2-B tests AP value lower on Dataset B than on Dataset A, it shows that the labeled data set has stronger challenge than the current abnormal behavior data set, i.e. the target deep learning algorithm is less accurate relative to the labeled data set, and the target deep learning algorithm needs to be adjusted.

In the application, the data set making mode is clear and concise, the flow is complete, and a large number of abnormal behavior detection data sets can be generated. Meanwhile, the abundant data set collecting mode can solve the problems that the existing abnormal behavior data set research scene is not wide enough, the background of the data set is not abundant enough, and the quantity of the data set is not enough. Meanwhile, the abnormal behaviors are summarized, divided and defined, and the problem that the categories of the abnormal behaviors in the existing data set are not rich enough can be solved. Meanwhile, a standard marking criterion is formulated, so that the generated abnormal behavior data sets in different directions can be used compatibly.

Based on the same inventive concept, the second embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method steps described in the first embodiment.

Based on the same inventive concept, a computer apparatus is further provided in the third embodiment of the present invention, as shown in fig. 7, for convenience of description, only the parts related to the embodiment of the present invention are shown, and details of the specific technology are not disclosed, please refer to the method part of the embodiment of the present invention. The computer device may be any terminal device including a mobile phone, a tablet computer, a PDA (Personal digital assistant), a POS (Point of Sales), a vehicle-mounted computer, and the like, taking the computer device as the mobile phone as an example:

fig. 7 is a block diagram showing a partial structure related to a computer device provided by an embodiment of the present invention. Referring to fig. 7, the computer apparatus includes: a memory 701 and a processor 702. Those skilled in the art will appreciate that the computer device configuration illustrated in FIG. 7 does not constitute a limitation of computer devices, and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components.

The following describes the components of the computer device in detail with reference to fig. 7:

the memory 701 may be used to store software programs and modules, and the processor 702 executes various functional applications and data processing by operating the software programs and modules stored in the memory 701. The memory 701 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.), and the like. Further, the memory 701 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 702 is a control center of the computer apparatus, and performs various functions and processes data by operating or executing software programs and/or modules stored in the memory 701 and calling data stored in the memory 701. Alternatively, processor 702 may include one or more processing units; preferably, the processor 702 may integrate an application processor, which primarily handles operating systems, user interfaces, application programs, etc., and a modem processor, which primarily handles wireless communications.

In this embodiment of the present invention, the processor 702 included in the computer device may have the functions corresponding to any of the method steps in the foregoing first embodiment.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A method of labeling anomalous behavior, the method comprising:

judging whether the labeled data set is correctly labeled;

2. The method of claim 1, wherein deriving the target data set by PASCAL VOC tagging comprises:

3. The method of claim 1, wherein the obtaining the target dataset by COCO tagging comprises:

4. The method of claim 1, wherein prior to said inputting the data set to be annotated into the abnormal behavior annotation model, the method further comprises:

taking a video stream in the monitoring video as the data source;

taking a video stream pre-recorded by a user as the data source;

taking pictures or videos in a real scene in a network as the data source;

5. The method of claim 1, wherein prior to adding an output layer in the second neural network model having a tensor size corresponding to the number of abnormal-behavior detection categories based on the number of abnormal-behavior detection categories, the method further comprises:

classifying the abnormal behaviors;

wherein the classifying the abnormal behavior comprises:

S＝{s₁、s₂、s₃}；

wherein s is₁＝{S_normal，S_dead}；

wherein s is₂＝{sp₁，sp₂，sp₃，sp₄}；

wherein s is₃＝{S_violence，S_non-violence}；

6. The method of claim 1, wherein after said obtaining the annotated data set, the method further comprises:

7. The method of claim 6, wherein performing the benchmark test on the annotated data set comprises:

8. The method of claim 6, wherein performing the dataset cross-validation on the annotated dataset comprises:

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method steps of any one of claims 1 to 8.

10. A computer device comprising a storage including a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method steps of any of claims 1-8 when executing the program.