WO2023029397A1

WO2023029397A1 - Training data acquisition method, abnormal behavior recognition network training method and apparatus, computer device, storage medium, computer program and computer program product

Info

Publication number: WO2023029397A1
Application number: PCT/CN2022/077716
Authority: WO
Inventors: 苏婧; 苏海昇; 王栋梁; 甘伟豪
Original assignee: 上海商汤智能科技有限公司
Priority date: 2021-08-30
Filing date: 2022-02-24
Publication date: 2023-03-09
Also published as: CN113705689A

Abstract

The present disclosure provides a training data acquisition method, an abnormal behavior recognition network training method and apparatus, a computer device, a storage medium, a computer program and a computer program product, the method comprising: acquiring network data, and collected data containing a specific abnormal behavior; acquiring action features of each piece of network data and collected data; and according to the similarity between the action features of each piece of network data and the action features of each piece of collected data, selecting, from the network data, similar network data that matches with the collected data, and using the collected data and the similar network data as positive sample training data for the specific abnormal behavior. According to the similarity between action features, training data that may be used as positive samples are determined from multiple pieces of network data, which improves the acquisition efficiency of training data and thereby accelerates the acquisition of abnormal behavior recognition networks.

Description

Training data acquisition method, abnormal behavior recognition network training method and device, computer equipment, storage medium, computer program, computer program product

Cross References to Related Applications

This disclosure is based on the Chinese patent application with the application number 202111006832.0, the application date is August 30, 2021, and the application name is "Training Data Acquisition Method and Abnormal Behavior Identification Network Training Method", and claims the priority of the Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference into this disclosure.

technical field

The present disclosure relates to the field of computer technology, and in particular to a training data acquisition method, an abnormal behavior recognition network training method and device, computer equipment, storage media, computer programs, and computer program products.

Background technique

In the scenario of smart city management, it is often necessary to automatically identify abnormal behaviors (such as fighting, climbing, etc.) through video data, so as to prevent abnormal behaviors from endangering urban safety and harmony. Automatic recognition of abnormal behavior often requires training the abnormal behavior recognition network through labeled abnormal behavior data. However, the collection and labeling of data often takes a long time, which makes it impossible to quickly train and obtain abnormal behavior recognition networks in a short period of time.

Contents of the invention

The disclosure provides a training data acquisition method, an abnormal behavior recognition network training method and device, computer equipment, a storage medium, a computer program, and a computer program product.

An embodiment of the present disclosure provides a training data acquisition method, including:

Obtain network data and collected data containing specific abnormal behaviors;

Obtaining an action feature of each of the network data and an action feature of each of the collected data;

According to the similarity between the action features of each of the network data and the action features of each of the collected data, select similar network data that matches the collected data from the network data, and store the collected data And the similar network data are used as positive sample training data for specific abnormal behaviors.

An embodiment of the present disclosure provides a network training method for abnormal behavior recognition, the method comprising:

Acquiring training data, the training data includes positive sample training data and negative sample training data, and the positive sample training data is obtained based on the above-mentioned training data acquisition method;

The abnormal behavior recognition network is iteratively trained through the positive sample training data and the negative sample training data until the loss output by the abnormal behavior recognition network is less than a preset first loss threshold, or the number of iterations is greater than a preset threshold.

An embodiment of the present disclosure provides a training data acquisition device, including:

A data acquisition module configured to acquire network data and collected data containing specific abnormal behaviors;

An action feature acquisition module configured to acquire an action feature of each of the network data and an action feature of each of the collected data;

The training data selection module is configured to select similar network data matching the collected data from the network data according to the similarity between the action features of each of the network data and the action features of each of the collected data , and use the collected data and the similar network data as positive sample training data for specific abnormal behaviors.

An embodiment of the present disclosure provides an abnormal behavior recognition network training device, the device includes:

The training data acquisition module is configured to acquire training data, the training data includes positive sample training data and negative sample training data, and the positive sample training data is obtained based on the above-mentioned training sample acquisition method;

The network training module is configured to iteratively train the abnormal behavior recognition network through the positive sample training data and the negative sample training data until the loss output by the abnormal behavior recognition network is less than the preset first loss threshold, or the number of iterations is greater than the preset Set the count threshold.

An embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored. When the program is executed by a processor, the above training data acquisition method or abnormal behavior identification network training method is implemented.

An embodiment of the present disclosure provides a computer device, and the computer device includes:

one or more processors;

memory for storing one or more programs;

When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the above training data acquisition method or abnormal behavior identification network training method.

An embodiment of the present disclosure provides a computer program, the computer program includes computer readable code, and when the computer readable code is read and executed by a computer, a part or part of the method in any embodiment of the present disclosure is realized. All steps.

An embodiment of the present disclosure provides a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and when the computer program is read and executed by a computer, any embodiment of the present disclosure is realized Some or all of the steps in the method.

The present disclosure provides a training data acquisition method, an abnormal behavior identification network training method and device, computer equipment, storage media, computer programs, and computer program products, for specific abnormal behaviors, to acquire collection data including specific abnormal behaviors, and to acquire network data, and obtain the action features of the collected data and network data, and according to the similarity between the action features of each network data and the action features of the collected data, determine several similar network data of the collected data, and use the similar network data and collected data as Positive training data for specific abnormal behaviors. Through the similarity of action features, the training data that can be used as positive samples is determined from several cheap network data, which improves the efficiency of training data acquisition, and thus speeds up the acquisition of abnormal behavior recognition networks.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments consistent with the disclosure and together with the disclosure serve to explain the principles of the disclosure.

FIG. 1 is a flow chart of a training data acquisition method provided by an embodiment of the present disclosure.

FIG. 2 is a flowchart of a method for obtaining similar network data provided by an embodiment of the present disclosure.

Fig. 3 is a schematic diagram of similar network data provided by an embodiment of the present disclosure.

FIG. 4 is a flowchart of a method for training an abnormal behavior recognition network provided by an embodiment of the present disclosure.

Fig. 5a is a flowchart of an iterative method provided by an embodiment of the present disclosure.

Fig. 5b is a schematic diagram of an iterative method provided by an embodiment of the present disclosure.

Fig. 6 is a block diagram of an apparatus for obtaining training data provided by an embodiment of the present disclosure.

Fig. 7 is a block diagram of an abnormal behavior recognition network training device provided by an embodiment of the present disclosure.

FIG. 8 is a hardware structural diagram of a computer device where a training data acquisition device or an abnormal behavior recognition network training device provided by an embodiment of the present disclosure is located.

Detailed ways

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present disclosure as recited in the appended claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only, and is not intended to limit the present disclosure. As used in this disclosure and the appended claims, the singular forms "a", "the", and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in the present disclosure to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of the present disclosure, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, the word "if" as used herein may be interpreted as "at" or "when" or "in response to a determination."

In the management of smart cities, in order to maintain the safety and harmony of the city, it is necessary to identify abnormal behaviors in the city (such as fighting, climbing, etc.) through video data. In order to identify abnormal behaviors, usually for each abnormal behavior, a large amount of labeled data is used to train the abnormal behavior recognition network, so as to identify the abnormal behavior through the trained abnormal behavior recognition network.

However, there will be some problems in this process. Because training the abnormal behavior recognition network requires a large amount of labeled data, and the acquisition of data usually takes a long time, and the labeling of data also requires a lot of manpower. In this way, when receiving a training task of an abnormal behavior recognition network, because there is only a small amount of labeled data, the abnormal behavior recognition network cannot be obtained quickly; if only a small amount of data is used for training, it will make the abnormal behavior recognition The accuracy of the network is difficult to meet the accuracy requirements that can be used.

In order to solve the above problems, the present disclosure provides a method for obtaining training data. For specific abnormal behaviors, the collected data including specific abnormal behaviors and network data are obtained, and the action characteristics of the collected data and network data are obtained, and according to each The similarity between the action features of the network data and the action features of the collected data is determined, and several similar network data of the collected data are determined, and the similar network data and the collected data are used as positive sample training data for specific abnormal behaviors. Through the similarity of action features, the training data that can be used as positive samples is determined from several cheap network data, which improves the efficiency of training data acquisition, and thus speeds up the acquisition of abnormal behavior recognition networks.

Next, the embodiments of the present disclosure will be described in detail.

In some embodiments, an embodiment of the present disclosure provides a training data acquisition method, which is used for acquiring positive sample training data of a specific abnormal behavior for the specific abnormal behavior. The method for acquiring training data may be performed by a terminal device or other processing device, wherein the terminal device may be user equipment (User Equipment, UE), mobile device, user terminal, terminal, cellular phone, cordless phone, personal digital processing (Personal Digital Assistant, PDA), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc. In some possible implementation manners, the method for obtaining training data may be implemented in a manner in which a processor invokes computer-readable instructions stored in a memory.

The method for obtaining training data will be described in detail below with reference to FIG. 1 .

As shown in Figure 1, the training data acquisition method provided by the present disclosure may include:

Step 101, acquiring network data and collected data including specific abnormal behaviors.

First of all, it should be noted that the method for obtaining training data in the present disclosure is aimed at a specific abnormal behavior. If it is desired to obtain training data of multiple specific abnormal behaviors, the method can be executed multiple times. Unless otherwise stated below. Collected data refers to the collected data for a specific abnormal behavior.

The collected data can be video or dynamic image data captured by the smart city management system, and the collected data are all data containing specific abnormal behaviors. In addition, the collected data is marked data, including the first label for a specific abnormal behavior.

The network data may include at least one of the following: data sets published on the Internet; web crawler data; data generated based on a virtual game engine. The datasets made public on the Internet refer to the training datasets made public on the Internet, in other words, the datasets that other Internet users have compiled for certain tasks, such as the K400 dataset. Web crawler data refers to data sets searched and sorted from the Internet. Compared with Internet public data sets, Internet public data sets refer to data sets that have been organized by other Internet users for training, while web crawler data is a Crawler, a data set obtained by sorting video data obtained by multiple crawlers. The network data in the embodiments of the present disclosure are more comprehensive, making the training data more diverse. The data generated based on the virtual game engine refers to the data generated by the virtual game engine, which can generate data that does not include scenes, and the generated data is more flexible, and more cheap data can be generated through the virtual game engine. In addition, the network data acquired in step 101 is all unmarked data and cannot be used directly.

Since specific abnormal behaviors are generally done by people and require the participation of at least one person, each collected data or network data contains the data of at least one person, and the specific abnormal behaviors in the collected data and network data Irrelevant data, such as a large area of background, can be cut out through portrait recognition to reduce the noise input to the abnormal behavior recognition network.

In addition, considering that the training of abnormal behavior recognition network can not only have positive samples (data containing specific abnormal behavior), but also negative samples (data not containing specific abnormal behavior), the method provided in this disclosure is only for positive sample training data acquisition. The negative sample training data also consists of two parts, one part is the data that does not contain specific abnormal behaviors in the city and includes at least one person's data, and the other part is the data obtained from the Internet that does not contain specific abnormal behaviors. The method for obtaining the negative sample training data in the embodiment of the present disclosure is described below, and will not be repeated here.

Since the method provided in the present disclosure is used to obtain positive sample data, and the collected data is positive sample data containing labels, then the first label indicates that the data carrying the label is data containing specific abnormal behaviors. Correspondingly, negative samples have a second label.

Step 103, acquiring the action features of each of the network data and the action features of each of the collected data.

Step 105, according to the similarity between the action features of each of the network data and the action features of each of the collected data, select similar network data that matches the collected data from the network data.

Next, step 103 and step 105 will be collectively described. In the embodiment of the present disclosure, the actions of different abnormal behaviors are different. For the same abnormal behavior, the actions of different data are similar. Therefore, in order to obtain similar network data of collected data, it is necessary to find Similar network data, then first of all, it is necessary to obtain the action characteristics of the collected data and network data.

The network data can be collected in advance, and the corresponding action features can be extracted after collection. Then the action features of the network data can be obtained and stored before the training task of training the abnormal behavior recognition network. Then the action feature of acquiring network data may be the action feature of acquiring pre-stored network data.

For the collected data, its action features need to be extracted through the backbone network (backbone). The backbone network is a network for extracting action features trained according to a specific data set. The data set required by the network of action features, this kind of data set generally adopts the data set published on the Internet, for example, the K400 data set can be used.

A collected data or a network data generally corresponds to an action feature, and the action feature is generally in the form of a vector. In other words, the input of the backbone network is a video data or a dynamic image data, and the output is a feature vector.

When the action features of the collected data are extracted through the backbone network, the action features of each network data and the action features of each collected data are obtained, including: obtaining the backbone network; extracting the action features of each collected data through the backbone network ; Obtain the pre-stored action features of each network data extracted through the backbone network.

Among them, the action feature of the collected data for similarity calculation can be the action feature of any collected data, or the action feature of several collected data can be synthesized into one action feature, and the synthesized action feature can be used as the similarity calculation Of course, the user may also designate a most representative action feature of the collected data as the action feature for similarity calculation, which is not limited in this disclosure. The method for obtaining similar network data in the embodiment of the present disclosure will be described in detail below, and will not be repeated here.

Step 107, adding a first label to the similar network data, and using the collected data and the similar network data as positive sample training data for specific abnormal behaviors.

In some embodiments, in order to make the similar network data available for training the abnormal behavior recognition network, it is necessary to add the first label to the similar network data, so that the similar network data can be used for supervised learning. Through the above method, the positive sample training data used to train the abnormal behavior recognition network for specific abnormal behaviors is obtained.

Next, the method for obtaining similar network data will be described in detail in conjunction with FIG. 2 . The method for obtaining similar network data in an embodiment of the present disclosure may include the following steps:

Step 201, obtain a backbone network.

Wherein, as mentioned above, the backbone network is trained through the network public data set, and the backbone network may be pre-trained, or may be trained during the execution of step 201 . The backbone network is used to extract motion features from video or dynamic image data.

Step 203, input each collected data into the backbone network, and obtain the action feature of each collected data.

Among them, the input of the backbone network is video or dynamic image data, and the output is a feature vector.

Step 205, acquiring the characteristics of the pre-stored network data.

The features of the network data are extracted through the pre-trained backbone network. Network data can be used to train abnormal behavior recognition networks for various abnormal behaviors. Therefore, when there is a training requirement for abnormal behavior recognition networks for various abnormal behaviors, the action features of network data are pre-extracted. Every time a training sample of an abnormal behavior is obtained, the action features of the same network data are repeatedly extracted through the backbone network, which can reduce the amount of calculation.

Step 207, according to the action features of all the collected data, synthesize the collected data center features.

In some embodiments, the action features of all the collected data are normalized and then averaged to synthesize the collected data center features. The collected data center features can synthesize the action features of each collected data and can accurately reflect the characteristics of specific abnormal behaviors. In the embodiment of the present disclosure, by synthesizing the action features of all the collected data into the collected data center feature, the collected data center feature can better reflect the characteristics of the abnormal behavior, and the selected similar network data is more accurate.

Step 209: Select similar network data matching the collected data according to the similarity between the action feature of each of the network data and the center feature of the collected data.

In some embodiments, for each network data, the feature similarity between its action feature and the collected data center feature is calculated, and the feature similarity can be cosine similarity, or Euclidean distance, and of course it can also be used to represent The value of the similarity between two vectors is not limited in this disclosure.

The selected similar network data may be network data whose similarity with the collected data is greater than a preset threshold. It is also possible to sort the network data according to the similarity, and select the top N network data, where N may be a value preset by the user.

In addition, the number N of selected similar network data can also be obtained according to other methods, such as determining according to the preset ratio of similar network data and collected data; in other words, according to the action characteristics of each network data and each collected data The similarity between the action features of the data, the selection of similar network data to the collected data, includes: according to the preset number ratio and the number of collected data, determine the number N of similar network data to be collected; from the network data, select N similar network data; wherein, the similarity between the action features of any similar network data and the action features of the collected data is not less than the similarity between the action features of any unselected network data and the action features of the collected data. Wherein, the preset quantity ratio is the quantity ratio of the similar network data to be collected and the collected data. Through the embodiments of the present disclosure, the proportion of the selected similar network data and collected data satisfies certain conditions, reducing the situation that the abnormal behavior recognition network is trained according to similar network data, and the obtained training results do not meet the preset conditions. better effect.

In some embodiments, when there are multiple types of network data (for example, different types can correspond to data crawled from different video websites, and different types can also correspond to different network data sets), the selected similar network data , you should choose from as many categories as possible. In some embodiments, after determining the number N of similar network data to be selected, the network data greater than the preset similarity threshold can be screened out according to the preset similarity threshold, and the screened out various categories Uniform sampling in the network data, select similar network data.

In some embodiments, considering that the trained abnormal behavior recognition network needs to have a higher recognition ability for collected data, and does not care whether the abnormal behavior recognition network has better recognition ability for similar network data, therefore, in order This enables the abnormal behavior recognition network to have a better recognition ability for the collected data. In the training data, the number of similar network data should not exceed the number of collected data. If the similar network data is selected according to a certain ratio, sampling can be performed according to the ratio of 1:2 between the similar network data and the data volume of the collected data to obtain positive sample training data. In order to increase the diversity of collected data, each type of collected data can be uniformly sampled. For example, the collected data can include: 100 data of stepping on the lawn, 100 data of climbing over the wall, etc.

The schematic diagram of the selected similar network data is shown in Figure 3. When there are multiple types of network data, when retrieving similar network data, if you search for each collected data separately, then for each collected data, in each type of network data, you can retrieve similar data collection. In addition, some actions may carry special scenes, such as climbing, etc., which require the interaction between people and climbing things. Since the extracted features are action features, network data that do not contain special scenes are generally retrieved ( For example, the network data 3) in Figure 3, although these network data do not contain specific abnormal behavior network data, but for the abnormal behavior recognition network, this kind of network data can represent actions more clearly because it does not contain scenes. Reduce noise. In other words, although this kind of network data does not contain specific abnormal behavior, it can make it easier for the abnormal behavior recognition network to learn the characteristics of action features, which is beneficial for training, and this kind of network data should be retained.

After obtaining the positive sample training data, it is also necessary to obtain the negative sample training data. The method of obtaining negative sample training data in the embodiment of the present disclosure may be to obtain the data that contains the second label and does not contain specific abnormal behaviors that are actually captured by the smart city management system. The method of obtaining such data is the same as that of the collected data , In addition, this kind of data often contains human actions, so that action features can be extracted. The second label is a label added for data that does not contain a specific abnormal behavior. It is also possible to obtain several network data with the lowest similarity with the action features of the collected data from the network data as negative samples.

In some embodiments, after the positive sample training data and the negative sample training data are obtained, the abnormal behavior recognition network needs to be trained through the positive sample training data and the negative sample training data. Therefore, the present disclosure also provides a network training method for abnormal behavior recognition. Next, the network training method for abnormal behavior recognition provided by the present disclosure will be described in detail with reference to FIG. 4 . The abnormal behavior recognition network training method can be executed by terminal equipment or other processing equipment, wherein the terminal equipment can be user equipment (User Equipment, UE), mobile equipment, user terminal, terminal, cellular phone, cordless phone, personal digital processing ( Personal Digital Assistant, PDA), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc. In some possible implementation manners, the abnormal behavior recognition network training method may be implemented by a processor invoking computer-readable instructions stored in a memory.

Fig. 4 is a flow chart of an abnormal behavior recognition network training method provided by an embodiment of the present disclosure. The method is used to train the abnormality for a specific abnormal behavior based on the positive sample training data obtained by the training data acquisition method in the embodiment of the present disclosure. Behavior recognition network; said method comprising:

Step 401, acquire training data, the training data includes positive sample training data and negative sample training data.

The positive sample training data is obtained based on the above training data acquisition method, and the negative sample data acquisition method is as described above, and will not be repeated here. In addition, in order to distinguish the positive sample training data from the negative sample training data, and to perform supervised learning, the positive sample training data includes a first label, and the negative sample training data includes a second label.

Step 403, iteratively train the abnormal behavior recognition network through the positive sample training data and the negative sample training data until the output loss of the abnormal behavior recognition network is less than the preset first loss threshold, or the number of iterations is greater than the preset threshold.

The first loss threshold and the number of times threshold can be selected by the user according to the actual situation. The smaller the first loss threshold and the larger the number of times threshold, the better the network training effect, but at the same time, the amount of calculation will increase, so network-based accuracy is required The requirements and calculation requirements are considered, and an appropriate first loss threshold and number of times threshold are selected.

Next, step 403 will be described in detail. As shown in Figure 5a, Figure 5a is a flowchart of an iterative method provided by an embodiment of the present disclosure, including:

Step 511, acquiring a determination result of each of the training data; the determination result is used to represent whether the training data includes the specific abnormal behavior.

Step 513, for each similar network data, if the determination result of the similar network data does not match the label, discard the similar network data as 0.

Step 515 , according to the judgment results and labels of the training data that have not been discarded, the loss output by the abnormal behavior recognition network is obtained.

Step 517: Update the weight of the abnormal behavior recognition network according to the output loss of the abnormal behavior recognition network.

Next, step 511-step 517 will be collectively described.

In step 511, the judgment result of each training data is obtained through the input action feature of each training data, the input action feature can be the action feature used when selecting the training data, or it can be the action feature in each iteration process. , obtained again according to the updated backbone network. The update of the backbone network can be to update the weight of the backbone network according to the output loss, so that the vector output by the backbone network can have a better representation ability for specific abnormal behaviors, so that the fully connected layer can better distinguish specific abnormal behaviors. training data and training data that does not contain specific abnormal behaviors, thus making the training effect better. By discarding similar network data whose judgment result does not match the label, the bias in training the abnormal behavior recognition network can be reduced.

In other words, in each iteration, before the features of each training data are input into the fully connected layer network, the method further includes: obtaining the action features of each training data according to the backbone network; the action features are used to determine whether the training data includes Specific abnormal behavior; in each iteration, after obtaining the loss output by the abnormal behavior recognition network according to the judgment result and label value of the undiscarded training data, the method further includes: updating the weight of the backbone network according to the loss. In the embodiments of the present disclosure, the backbone network can be updated so that the backbone network can extract action features more accurately, thereby improving the accuracy of the abnormal behavior recognition network.

Among them, in step 511, the judgment result can be obtained through the fully connected layer network, and the classifier composed of the fully connected layer network classifies the input action feature vector to obtain the output judgment result. What needs to be trained in this disclosure is to use A binary classifier for obtaining judgment results. Through cheap data, a binary classifier with sufficient precision can be obtained, so that the binary classifier, that is, the abnormal behavior recognition network, can be quickly obtained.

In step 513, the similar network data whose judgment result does not match the label is discarded, so as to reduce the inaccuracy of the final trained binary classifier due to the similar network data inconsistent with the label during the next training, and when it is necessary to update the backbone network weight In some cases, discarding these similar network data can also prevent these similar network data from biasing the representation space of the backbone network, thereby preventing the final binary classifier from being affected.

Among them, since the abnormal behavior recognition network is a binary classifier, the corresponding labels are only the first label and the second label, and any two numbers can be used to represent the judgment result, and any two numbers can represent the label value. Whether the result and the corresponding label both indicate that a specific abnormal behavior is included, or whether they both indicate that a specific abnormal behavior is not included, the two are consistent, and in the rest of the cases, it is determined that the result and the label do not match. For example, for the judgment result, 0 indicates that the training data contains specific abnormal behavior, 1 indicates that the training data does not contain specific abnormal behavior, 0 indicates the first label, and 1 indicates the second label. If The judgment result is 0, and the label is 0. According to the judgment result and the meaning of the first label 0, it can be seen that the two are consistent. If the judgment result is 1, and the label is 0, it means that the two do not match.

In addition, considering that the collected data cannot be discarded because the similar network data needs to be discarded, it is necessary to determine whether the data is similar network data before discarding the training data. The method of determining whether it is similar network data can be judged according to whether each pre-stored training data is similar network data, or based on generating confrontation, training a discriminator, and using this discriminator to judge whether the training data is network data . In other words, in each iteration, for each similar network data, if the determination result of the similar network data does not match the label, before discarding the similar network data, the method further includes: The training data is input into a discriminator; the discriminator is used to judge whether the training data is collection data; for each similar network data, if the judgment result of the similar network data does not match the label, the similar network discarding data, including: for each training data, in response to the discriminator outputting that the training data is not collected data, and the judgment result of the training data does not match the label of the training data, then combining the judgment result with the The training data with inconsistent labels are discarded. In the embodiments of the present disclosure, the input training data does not need to include features identifying whether it is collected data, which can reduce the complexity of the abnormal behavior recognition network.

In step 515, the loss of the abnormal behavior recognition network can be obtained according to the loss function, the loss function is a function preset by the user, the input is the judgment result and label of each training data, and the output is the loss of the network, which is used to evaluate the abnormal behavior recognition network good or bad. The loss function can choose the loss function of the binary classifier.

Since the abnormal behavior recognition network in the present disclosure pays more attention to the recognition results of the collected data, when calculating the loss, weighted calculations can be performed to increase the weight of collected data and reduce the weight of similar network data. If the amount of similar network data is less than the collected data when obtaining training data, since the obtained training data has been obtained according to a certain proportion, the contribution of collected data to the loss is already greater than the loss of similar network data. Calculate Loss can be calculated without weighting.

The weight updated in step 517 is the weight of the abnormal behavior recognition network, in other words, what is updated is the weight of the network (which can be a fully connected layer) used in step 511, and the weight update can be updated according to the gradient descent method, or The update may be performed according to other weight update methods, which are not limited in the present disclosure.

In some embodiments, the abnormal behavior recognition network in the embodiments of the present disclosure may at least include multiple modules such as a backbone network feature extractor, a discriminator, and a classifier. As shown in FIG. 5b, FIG. 5b is a schematic diagram of an iterative method provided by an embodiment of the present disclosure. The data in the data set (Dataset) and the auxiliary data set (Auxiliary Dataset) can be input into the backbone network feature extractor (Backbone feature extract) 501, and the feature extraction process is performed to obtain the motion feature sequence (Motion sequence) corresponding to the data set, and Auxiliary feature sequence corresponding to auxiliary dataset. Among them, the data set can be understood as a collection of positive sample training data, and the auxiliary data set can be understood as a collection of network data. The action features or auxiliary features output by the backbone network feature extractor 501 can include at least spatial features (Scenario dimension) and time features. (Time dimension), according to the similarity between each auxiliary feature in the auxiliary feature sequence and each action feature in the action feature sequence, determine the set of negative sample training data from the auxiliary data set. The terminal device can use the positive sample training data and the negative sample training data as training data, and can input the action features corresponding to the positive sample training data and the auxiliary features corresponding to the negative sample training data into the discriminator (Discriminator) 502 to perform feature discrimination (Feature Discriminator), the discriminator 502 is used to judge whether the current training data is collected data (for example, if it belongs to the collected data output, it is Real, if it belongs to non-collected data, it is output as Fake, etc.), wherein, the terminal device can pass according to the action feature or The scene (Scene) represented by the auxiliary feature and the type of action (action) are used to determine the current reward (Reward), and then calculate the feature loss (Feature Loss) to update the weight configured by the discriminator 502. After the terminal device screens the training data through the discriminator 502, the action features corresponding to the positive sample training data and the auxiliary features corresponding to the negative sample training data can be input into the classifier 503 to determine the judgment result of the training data. The judgment result can be Indicating whether the training data matches the label of the training data, wherein the classifier 503 may be a fully connected layer (Fully connected layers, FC). For each training data, the terminal device discards the training data in response to the discriminator 502 outputting that the training data is non-collection data, and the classifier 503 determines that the judgment result of the non-collection data does not match the label of the training data. The terminal device can determine the classification loss (Classification Loss) according to the judgment result and label of the training data that has not been discarded, and then update the weight of the abnormal behavior recognition network.

In the embodiment of the present disclosure, by obtaining existing network data (labeled or unlabeled), for example: by referring to other large video databases, etc., without increasing the size or complexity of the abnormal behavior identification network, Train to obtain a more accurate abnormal behavior recognition network and improve the robustness of the abnormal behavior recognition network. For example, the abnormal behavior recognition network can refer to the external unlabeled network data or the public datasets of the marked network, and retrieve the network data with similar similarity with the collected data, which can improve the training efficiency of the abnormal behavior recognition network. It enables the abnormal behavior recognition network to achieve good training results through a small amount of collected data without using billions of parameters, and can start the abnormal behavior recognition task in a short time. In the embodiment of the present disclosure, the connection between data can be represented by the similarity between different data, and the abnormal behavior recognition network can retrieve similar network data associated with the collected data from a large amount of public data or network data, and jointly train Abnormal behavior recognition network.

In addition, in the above-mentioned embodiment, only the abnormal behavior recognition network for a specific abnormal behavior is described. In the case that multiple abnormal behavior recognition networks for different abnormal behaviors need to be trained at the same time, multiple networks can be trained at the same time, sharing the same network. A backbone network, the weights of each abnormal behavior recognition network are updated according to the losses of their respective networks, and the weights of the backbone network are updated according to the losses of all abnormal behavior recognition networks. In this way, the training efficiency is improved.

The above descriptions of the various embodiments tend to emphasize the differences between the various embodiments, the same or similar points can be referred to each other, and for the sake of brevity, the present disclosure will not repeat them.

Those skilled in the art can understand that in the above-mentioned method of specific implementation, the writing order of each step does not imply a strict execution order and constitutes any limitation on the implementation process, and the execution order of each step should be based on its function and possible internal Logically OK.

Corresponding to the aforementioned embodiments of the training data acquisition method and the abnormal behavior recognition network training method, the present disclosure also provides embodiments of the training data acquisition device, the abnormal behavior recognition network training device and the terminal to which they are applied.

As shown in FIG. 6, FIG. 6 is a block diagram of a training data acquisition device provided by an embodiment of the present disclosure, and the device includes:

The data acquisition module 610 is configured to acquire network data and collected data including specific abnormal behaviors.

The action feature acquisition module 620 is configured to acquire the action features of each of the network data and the action features of each of the collected data.

The training data selection module 630 is configured to select similar network data matching the collected data from the network data according to the similarity between the action features of each of the network data and the action features of each collected data, The collected data and the similar network data are used as positive sample training data for specific abnormal behaviors.

In some embodiments, the network data includes at least one of the following: data sets published on the Internet; web crawler data; data generated based on a virtual game engine. Such network data is more comprehensive, making the training data more diverse.

In some implementations, the data acquisition module 610 includes: a first acquisition submodule configured to acquire a backbone network; an extraction submodule configured to extract the action feature of each of the collected data through the backbone network; the second The acquisition submodule is configured to acquire the pre-stored action features of each of the network data extracted through the backbone network. Through the backbone network, the action features of the collected data can be obtained, and the action features of the network data are pre-extracted. When faced with the requirements of multiple abnormal behavior training, there is no need to obtain the action features of the network data multiple times, which improves the training efficiency.

In some implementations, the training data selection module 630 includes: a synthesis submodule configured to synthesize the features of the collected data centers according to the action features of all the collected data; a first selection submodule configured to combine The similarity between the action feature of the data and the feature of the collected data center is to select similar network data that matches the collected data. In this way, the action features of all collected data are synthesized into a central feature, which can better reflect the characteristics of the abnormal behavior, and the selected similar network data is more accurate.

In some embodiments, the training data selection module 630 includes: a determination submodule configured to determine the number N of similar network data to be collected according to the preset number ratio and the quantity of collected data; the second selection submodule , configured to select N similar network data from the network data; wherein, the similarity between the action features of any of the similar network data and the action features of the collected data is not less than that of the unselected network data The similarity between any action feature of the network data and the action feature of the collected data. In this way, the proportion of the selected similar network data and collected data satisfies certain conditions, so that the abnormal behavior recognition network will not be biased by similar network data, and has a better effect on collected data.

As shown in FIG. 7, FIG. 7 is a block diagram of an abnormal behavior recognition network training device provided by an embodiment of the present disclosure, and the device includes:

The training data obtaining module 710 is configured to obtain training data, the training data includes positive sample training data and negative sample training data, and the positive sample training data is obtained based on the above-mentioned training sample obtaining method.

The network training module 720 is configured to iteratively train the abnormal behavior recognition network through the positive sample training data and the negative sample training data until the loss output by the abnormal behavior recognition network is less than the preset first loss threshold, or the number of iterations is greater than the preset times threshold.

In some implementations, the positive sample training data includes a first label, and the negative sample training data includes a second label. The network training module 720 includes: a third acquisition submodule, configured to acquire a determination result of each of the training data during each iteration; the determination result is used to represent whether the training data includes the specific Abnormal behavior; the discarding submodule is configured to discard the similar network data for each similar network data if the determination result of the similar network data does not match the label; the fourth acquisition submodule is configured to The judgment result and label of the training data to obtain the loss output by the abnormal behavior recognition network; the first update submodule is configured to update the weight of the abnormal behavior recognition network according to the loss output by the abnormal behavior recognition network. In this way, similar network data that are not similar to the collected data are discarded, reducing the possibility that the abnormal behavior recognition network is biased by these data.

In some embodiments, the device further includes: a fifth acquisition submodule configured to acquire the action feature of each of the training data according to the backbone network; the action feature is used to determine whether the training data includes the Specific abnormal behavior; the sixth acquisition submodule is configured to obtain the output loss of the abnormal behavior recognition network according to the judgment result and label of the training data that has not been discarded, and the device also includes: a second update submodule , configured to update the weights of the backbone network according to the loss. In this way, by updating the backbone network, the backbone network can better extract the characteristics of abnormal behavior, and improve the accuracy of the abnormal behavior recognition network.

In some embodiments, the device further includes: an input submodule configured to input each of the training data into a discriminator; the discriminator is used to judge whether the training data is collected data. The discarding submodule includes: a response unit configured to, for each training data, respond to the discriminator outputting that the training data is not collected data, and the judgment result of the training data does not match the label of the training data, then Discarding the training data whose determination result does not match the label. In this way, the input training data does not need to include features identifying whether it is the collected data, which reduces the complexity of the model.

For the implementation process of the functions and effects of each module in the above-mentioned device, please refer to the implementation process of the corresponding steps in the above-mentioned method for details, and details will not be repeated here.

As for the device embodiment, since it basically corresponds to the method embodiment, for related parts, please refer to the part description of the method embodiment. The device embodiments described above are only illustrative, and the modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, they may be located in One place, or it can be distributed to multiple network modules. Part or all of the modules can be selected according to actual needs to achieve the purpose of the disclosed solution. It can be understood and implemented by those skilled in the art without creative effort.

As shown in Figure 8, Figure 8 shows a hardware structure diagram of the computer equipment where the above-mentioned device is located, the computer equipment may only include the training data acquisition device, may also only include the abnormal behavior recognition network training device, and may also include the training data An acquisition device and an abnormal behavior identification network training device. The device may include: a processor 810 , a memory 820 , an input/output interface 830 , a communication interface 840 and a bus 80 . The processor 810 , the memory 820 , the input/output interface 830 and the communication interface 840 are connected to each other within the device through the bus 850 .

The processor 810 may be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related programs to realize part or all of the technical solutions provided by the embodiments of the present disclosure.

The memory 820 can be implemented in the form of ROM (Read Only Memory, read-only memory), RAM (Random Access Memory, random access memory), static storage device, dynamic storage device, and the like. The memory 820 can store an operating system and other application programs. When implementing the technical solutions provided by the embodiments of the present disclosure through software or firmware, the relevant program codes are stored in the memory 820 and invoked by the processor 810 for execution.

The input/output interface 830 is used to connect the input/output module to realize information input and output. The input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions. The input device may include a keyboard, mouse, touch screen, microphone, various sensors, etc., and the output device may include a display, a speaker, a vibrator, an indicator light, and the like.

The communication interface 840 is used to connect a communication module (not shown in the figure), so as to realize communication interaction between the device and other devices. The communication module can realize communication through wired methods (such as USB, network cable, etc.), and can also realize communication through wireless methods (such as mobile network, WIFI, Bluetooth, etc.).

Bus 850 includes a path for carrying information between the various components of the device (eg, processor 810, memory 820, input/output interface 830, and communication interface 840).

It should be noted that although the above device only shows the processor 810, the memory 820, the input/output interface 830, the communication interface 840 and the bus 850, in the specific implementation process, the device may also include other components. In addition, those skilled in the art can understand that the above-mentioned device may only include components necessary to realize the solutions of the embodiments of the present disclosure, and does not necessarily include all the components shown in the figure.

An embodiment of the present disclosure also provides a computer-readable storage medium, on which a computer program is stored. When the program is executed by a processor, the above-mentioned training data acquisition method or abnormal behavior identification network training method is implemented. Wherein, the computer-readable storage medium may only store the computer program corresponding to the training data set acquisition method, or may only store the computer program corresponding to the abnormal behavior recognition network training method.

A computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device, and may be a volatile storage medium or a nonvolatile storage medium. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or flash memory), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanically encoded device, such as a printer with instructions stored thereon A hole card or a raised structure in a groove, and any suitable combination of the above. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.

An embodiment of the present disclosure also proposes a computer program, the computer program includes computer readable code, and when the computer readable code is read and executed by a computer, part of the method in any embodiment of the present disclosure is implemented or all steps.

An embodiment of the present disclosure also provides a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are stored in a processor of an electronic device When running in the electronic device, the processor in the electronic device executes some or all steps of the above method.

Other embodiments of the present disclosure will be readily apparent to those skilled in the art from consideration of the disclosure and practice of the invention disclosed herein. The present disclosure is intended to cover any modification, use or adaptation of the present disclosure. These modifications, uses or adaptations follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field not disclosed in the present disclosure. . The disclosure and examples are to be considered exemplary only, with the true scope and spirit of the disclosure indicated by the following claims.

It should be understood that the present disclosure is not limited to the precise constructions which have been described above and shown in the drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

The above descriptions are only preferred embodiments of the present disclosure, and are not intended to limit the present disclosure. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present disclosure shall be included in the present disclosure within the scope of protection.

Claims

A training data acquisition method, the method comprising:

Obtain network data and collected data containing specific abnormal behaviors;

Obtaining an action feature of each of the network data and an action feature of each of the collected data;

According to the similarity between the action features of each of the network data and the action features of each of the collected data, select similar network data that matches the collected data from the network data, and store the collected data And the similar network data are used as positive sample training data for specific abnormal behaviors.
The method according to claim 1, wherein the network data includes at least one of the following:

Data sets exposed on the Internet; web crawler data; data generated based on virtual game engines.
The method according to any one of claims 1-2, wherein said obtaining the action features of each of said network data and the action features of each of said collected data comprises:

Get the backbone network;

extracting action features of each of the collected data through the backbone network;

Acquiring pre-stored action features of each of the network data extracted through the backbone network.
The method according to any one of claims 1-3, wherein, according to the similarity between the action features of each of the network data and the action features of each of the collected data, from the network data Select similar network data that matches the collected data, including:

Synthesizing the collected data center features according to the action features of all the collected data;

According to the similarity between the action features of each of the network data and the features of the collected data centers, similar network data matching the collected data are selected.
The method according to any one of claims 1-4, wherein, according to the similarity between the action features of each of the network data and the action features of each of the collected data, from the network data Select similar network data that matches the collected data, including:

Determine the quantity N of the similar network data to be collected according to the preset quantity ratio and the quantity of the collected data;

From the network data, select N pieces of similar network data; wherein, the similarity between the action features of any of the similar network data and the action features of the collected data is not less than that of none of the network data The similarity between the action features of any selected network data and the action features of the collected data.
A network training method for abnormal behavior recognition, said method comprising:

Obtain training data, the training data includes positive sample training data and negative sample training data, the positive sample training data is obtained based on the method of any one of claims 1-5;

The abnormal behavior recognition network is iteratively trained through the positive sample training data and the negative sample training data until the output loss of the abnormal behavior recognition network is less than a preset first loss threshold, or the number of iterations is greater than a preset threshold.
The method of claim 6, wherein the positive sample training data includes a first label, and the negative sample training data includes a second label;

The iterative training of the abnormal behavior recognition network through the positive sample training data and the negative sample training data includes:

During each iteration, a determination result of each of the training data is obtained; the determination result is used to characterize whether the training data includes the specific abnormal behavior;

For each of the similar network data, if the determination result of the similar network data does not match the label, discarding the similar network data;

Obtaining the loss output by the abnormal behavior recognition network according to the judgment result and label of the training data not discarded;

The weights of the abnormal behavior recognition network are updated according to the output loss of the abnormal behavior recognition network.
The method according to claim 6 or 7, wherein, in the process of each iteration, before obtaining the determination result of each of the training data, the method further comprises:

Acquiring action features of each of the training data according to the backbone network; the action features are used to determine whether the training data includes the specific abnormal behavior;

After obtaining the loss output by the abnormal behavior recognition network according to the judgment result and label of the undiscarded training data, the method further includes:

According to the loss, the weight of the backbone network is updated.
The method according to any one of claims 6-8, wherein, for each of the similar network data, if the determination result of the similar network data does not match the label, before discarding the similar network data, The method also includes:

Each of the training data is input into a discriminator; the discriminator is used to judge whether the training data is the collected data;

For each of the similar network data, if the determination result of the similar network data does not match the label, then discarding the similar network data, including:

For each of the training data, in response to the discriminator outputting that the training data is not collected data, and the judgment result of the training data does not match the label of the training data, the judgment result and the label Inconsistent training data is discarded.
A training data acquisition device, said device comprising:

A data acquisition module configured to acquire network data and collected data containing specific abnormal behaviors;

An action feature acquisition module configured to acquire an action feature of each of the network data and an action feature of each of the collected data;

The training data selection module is configured to select similar network data matching the collected data from the network data according to the similarity between the action features of each of the network data and the action features of each of the collected data , and use the collected data and the similar network data as positive sample training data for specific abnormal behaviors.
The device according to claim 10, wherein the network data includes at least one of the following:

Data sets exposed on the Internet; web crawler data; data generated based on virtual game engines.
The device according to any one of claims 10-11, wherein the action feature acquisition module includes:

The first acquisition sub-module is configured to acquire the backbone network;

The extraction submodule is configured to extract the action feature of each of the collected data through the backbone network;

The second obtaining submodule is configured to obtain the pre-stored action feature of each of the network data extracted through the backbone network.
The device according to any one of claims 10-12, wherein the training data selection module includes:

The synthesis sub-module is configured to synthesize the collected data center features according to the action features of all the collected data;

The first selection submodule is configured to select similar network data matching the collected data according to the similarity between the action feature of each of the network data and the collected data center feature.
The device according to any one of claims 10-13, wherein the training data selection module includes:

The determining submodule is configured to determine the quantity N of the similar network data to be collected according to the preset quantity ratio and the quantity of the collected data;

The second selection submodule is configured to select N pieces of similar network data from the network data; wherein, the similarity between the action features of any of the similar network data and the action features of the collected data, Not less than the similarity between the action features of any unselected network data in the network data and the action features of the collected data.
A network training device for abnormal behavior recognition, said device comprising:

A training data acquisition module configured to acquire training data, the training data comprising positive sample training data and negative sample training data, the positive sample training data being obtained based on the method of any one of claims 1-5;

The network training module is configured to iteratively train the abnormal behavior recognition network through the positive sample training data and the negative sample training data until the loss output by the abnormal behavior recognition network is less than the preset first loss threshold, or the number of iterations is greater than the preset Set the count threshold.
The apparatus of claim 15, wherein the positive sample training data includes a first label, and the negative sample training data includes a second label;

The network training module includes:

The third acquisition submodule is configured to acquire a determination result of each of the training data during each iteration; the determination result is used to represent whether the training data includes the specific abnormal behavior;

The discarding submodule is configured to, for each of the similar network data, discard the similar network data if the determination result of the similar network data does not match the label;

The fourth acquisition sub-module is configured to obtain the loss output by the abnormal behavior identification network according to the judgment result and label of the training data that has not been discarded;

The first update submodule is configured to update the weight of the abnormal behavior recognition network according to the output loss of the abnormal behavior recognition network.
The device according to claim 15 or 16, wherein, before the third acquiring submodule, the device further comprises:

The fifth acquisition submodule is configured to acquire the action feature of each of the training data according to the backbone network; the action feature is used to determine whether the training data includes the specific abnormal behavior;

The sixth acquisition sub-module is configured to obtain the output loss of the abnormal behavior recognition network according to the judgment result and label of the training data that has not been discarded, and the device further includes:

The second update submodule is configured to update the weight of the backbone network according to the loss.
The device according to any one of claims 15-17, wherein, before the discarding submodule, the device further comprises:

The input sub-module is configured to input each of the training data into a discriminator; the discriminator is used to judge whether the training data is the collected data;

The discarding submodule includes:

The response unit is configured to, for each of the training data, respond to the discriminator outputting that the training data is not collected data, and the determination result of the training data does not match the label of the training data, then the determination As a result, training data that does not match the label is discarded.
A computer-readable storage medium storing a computer program, and implementing the method according to any one of claims 1 to 9 when the computer program is executed by a processor.
A computer device comprising:

one or more processors;

memory for storing one or more programs;

When the one or more programs are executed by the one or more processors, the one or more processors implement the method according to any one of claims 1-9.
A computer program comprising computer readable codes, in case the computer readable codes run on the device, the processor in the device executes the method for implementing any one of claims 1 to 9.
A computer program product configured to store computer-readable instructions that, when executed, cause a computer to perform the method of any one of claims 1-9.