CN113469265A

CN113469265A - Data category attribute determining method and device, storage medium and electronic device

Info

Publication number: CN113469265A
Application number: CN202110796429.6A
Authority: CN
Inventors: 郭思郁; 王宁波
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-07-14
Filing date: 2021-07-14
Publication date: 2021-10-01

Abstract

The embodiment of the invention provides a method and a device for determining data category attributes, a storage medium and an electronic device, wherein the method comprises the following steps: inputting target data into a target network model to obtain a first characteristic and a second characteristic of the target data output by the target network model, wherein the first characteristic is used for representing a characteristic mean value of the target data, and the second characteristic is used for representing a characteristic variance of the target data; determining a target data category to which the target data belongs and the probability of the target data belonging to the target data category by using the first characteristic and the second characteristic; and determining the class attribute of the target data based on the target data class to which the target data belongs and the probability of the target data belonging to the target data class. By the method and the device, the problem of inaccurate classification of the external data in the related technology can be solved, and the effect of accurately classifying the data is achieved.

Description

Data category attribute determining method and device, storage medium and electronic device

Technical Field

The embodiment of the invention relates to the field of data processing, in particular to a method and a device for determining data category attributes, a storage medium and an electronic device.

Background

In the real world, almost all classification models face a problem in practical use, external data which does not belong to any known class needs to be processed, and a general classification method forcibly classifies the data into one of the known classes.

The open-set classification problem (open-set problem) not only includes the number categories of 0-9, but also includes other unknown categories such as A-Z and the like, but the unknown categories have no label, and the classifier cannot know the specific categories of the images in the unknown categories, such as: whether or not A, these hundreds of different classes of images together form a class: unknown classes, within the detection we call background classes (background), while the open set classification problem is: distinguish between these 10 classes and reject other unknown classes.

The open set classification method disclosed in the prior art adds many classes during training, but still faces the problem of external data during prediction.

In view of the above technical problems, no effective solution has been proposed in the related art.

Disclosure of Invention

The embodiment of the invention provides a method and a device for determining a data category attribute, a storage medium and an electronic device, which are used for at least solving the problem of inaccurate classification of external data in the related art.

According to an embodiment of the present invention, there is provided a method for determining a data category attribute, including: inputting target data into a target network model to obtain a first feature and a second feature of the target data output by the target network model, wherein the first feature is used for representing a feature mean value of the target data, and the second feature is used for representing a feature variance of the target data; determining a target data category to which the target data belongs and the probability of the target data belonging to the target data category by using the first characteristic and the second characteristic; and determining the class attribute of the target data based on the target data class to which the target data belongs and the probability of the target data belonging to the target data class.

According to another embodiment of the present invention, there is provided an apparatus for determining a data category attribute, including: a first input module, configured to input target data into a target network model, and obtain a first feature and a second feature of the target data output by the target network model, where the first feature is used to represent a feature mean of the target data, and the second feature is used to represent a feature variance of the target data; the first determining module is used for determining a target data category to which the target data belongs and the probability of the target data belonging to the target data category by using the first characteristic and the second characteristic; and the second determining module is used for determining the class attribute of the target data based on the target data class to which the target data belongs and the probability of the target data belonging to the target data class.

In an exemplary embodiment, the first determining module includes: a first input unit configured to input the first feature to a classifier to obtain a class of the target data to which the target data calculated by the classifier belongs, wherein the second feature is also used to indicate an output feature of the classifier; the first determining unit is used for determining the probability that the target data belongs to the target data category based on the feature mean of the first feature, the feature variance of the second feature, the first parameter and the preset data category.

In an exemplary embodiment, the second determining module includes one of: a second determining unit, configured to determine that the target data belongs to the target data category when the probability is greater than or equal to a preset threshold; and a third determining unit, configured to determine that the target data belongs to an abnormal data category when the probability is smaller than the preset threshold.

In an exemplary embodiment, the apparatus further includes: a second input module, configured to input target data into a target network model, and before obtaining a first feature and a second feature of the target data output by the target network model, input internal data and external data in determined sample data into an original network model according to a first preset ratio, so as to obtain a third feature and a fourth feature output by the original network model, where the third feature is used to represent a feature mean of the internal data and the external data, the fourth feature is used to represent a feature variance of the internal data and the external data, the external data is used to represent data of a non-attribution type, and the internal data is used to represent data of an attribution type; a first separation module for separating the third feature and the fourth feature according to a second preset ratio; and the first training module is used for training the original network model according to the third characteristic and the fourth characteristic to obtain the target network model.

In one exemplary embodiment, a first training module includes: a fourth determining unit, configured to determine a feature memory library, where the feature memory library is used to store feature distribution of the sample data; a fifth determining unit configured to determine a first distance between the third feature and the feature memory library to determine a first probability of a data type to which the internal data belongs; a sixth determining unit configured to determine a second distance between the fourth feature and the feature storage library to determine a second probability of the data type to which the external data belongs; and the first training unit is used for training the original network model based on the first probability and the second probability to obtain the target network model.

According to a further embodiment of the present invention, there is also provided a computer-readable storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.

According to the method, the target data are input into the target network model, so that a first characteristic and a second characteristic of the target data output by the target network model are obtained, wherein the first characteristic is used for representing a characteristic mean value of the target data, and the second characteristic is used for representing a characteristic variance of the target data; determining a target data category to which the target data belongs and the probability of the target data belonging to the target data category by using the first characteristic and the second characteristic; and determining the class attribute of the target data based on the target data class to which the target data belongs and the probability of the target data belonging to the target data class. On the basis of fixing the classification number, whether the target data belongs to internal data or external data can be effectively determined, the classification accuracy is guaranteed, and the external data is prevented from being classified into known classes by mistake. Therefore, the problem of inaccurate classification of the external data in the related art can be solved, and the effect of accurately classifying the data is achieved.

Drawings

Fig. 1 is a block diagram of a hardware structure of a mobile terminal of a method for determining a data category attribute according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method of determining attributes of data categories according to an embodiment of the invention;

FIG. 3 is a training flow diagram according to an embodiment of the invention;

FIG. 4 is a flow diagram of internal data processing in the uncertainty estimation module according to an embodiment of the present invention;

FIG. 5 is a flow diagram of the processing of external data in the uncertainty estimation module according to an embodiment of the present invention;

FIG. 6 is a test flow diagram according to an embodiment of the invention;

fig. 7 is a block diagram of the structure of a data category attribute determination apparatus according to an embodiment of the present invention.

Detailed Description

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings in conjunction with the embodiments.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

The method embodiments provided in the embodiments of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking the operation on the mobile terminal as an example, fig. 1 is a hardware structure block diagram of the mobile terminal of the method for determining the data category attribute according to the embodiment of the present invention. As shown in fig. 1, the mobile terminal may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), and a memory 104 for storing data, wherein the mobile terminal may further include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and does not limit the structure of the mobile terminal. For example, the mobile terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be used to store a computer program, for example, a software program and a module of application software, such as a computer program corresponding to the method for determining the data category attribute in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the mobile terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In this embodiment, a method for determining a data category attribute is provided, and fig. 2 is a flowchart of a method for determining a data category attribute according to an embodiment of the present invention, as shown in fig. 2, the flowchart includes the following steps:

step S202, target data are input into a target network model, and a first feature and a second feature of the target data output by the target network model are obtained, wherein the first feature is used for representing a feature mean value of the target data, and the second feature is used for representing a feature variance of the target data;

step S204, determining a target data category to which the target data belongs and the probability of the target data belonging to the target data category by using the first characteristic and the second characteristic;

step S206, the category attribute of the target data is determined based on the target data category to which the target data belongs and the probability of the target data belonging to the target data category.

The execution subject of the above steps may be a terminal, but is not limited thereto.

The present embodiment includes, but is not limited to, application in a scenario of classifying data, for example, classification of data of a known class, classification of data of an unknown class.

In the present embodiment, the target data includes, but is not limited to, internal data (known category), external data (unknown category). The target Network model includes, but is not limited to, Cellular Neural Network (CNN).

Through the steps, the target data are input into the target network model, and a first characteristic and a second characteristic of the target data output by the target network model are obtained, wherein the first characteristic is used for representing a characteristic mean value of the target data, and the second characteristic is used for representing a characteristic variance of the target data; determining a target data category to which the target data belongs and the probability of the target data belonging to the target data category by using the first characteristic and the second characteristic; and determining the class attribute of the target data based on the target data class to which the target data belongs and the probability of the target data belonging to the target data class. On the basis of fixing the classification number, whether the target data belongs to internal data or external data can be effectively determined, the classification accuracy is guaranteed, and the external data is prevented from being classified into known classes by mistake. Therefore, the problem of inaccurate classification of the external data in the related art can be solved, and the effect of accurately classifying the data is achieved.

In one exemplary embodiment, determining a probability that the target data belongs to the target data class using the first feature and the second feature comprises:

s1, inputting the first features into the classifier to obtain the target data category to which the target data calculated by the classifier belongs, wherein the second features are also used for representing the output features of the classifier;

and S2, determining the probability that the target data belongs to the target data category based on the feature mean of the first feature and the feature variance of the second feature, the first parameter and the preset data category.

In this embodiment, the probability that the target data belongs to the target data category is determined by the following formula:

wherein P is used to represent the probability of the target data class, μ is used to represent the feature mean of the first feature, σ is used to represent the feature variance of the second feature, m is used to represent the first parameter, c is used to represent the preset data class, and i is used to represent the order of the feature means of the first feature.

In one exemplary embodiment, determining the class attribute of the target data based on the target data class to which the target data belongs and the probability that the target data belongs to the target data class comprises one of:

s1, determining that the target data belong to the target data category when the probability is greater than or equal to a preset threshold;

and S2, determining that the target data belong to the abnormal data category under the condition that the probability is smaller than the preset threshold value.

In this embodiment, for example, if the probability is greater than the threshold T, the classification is considered to be correct, and if the probability is less than T, the classification is considered to belong to uncertain data.

In an exemplary embodiment, before inputting the target data into the target network model and obtaining the first feature and the second feature of the target data output by the target network model, the method further includes:

s1, inputting internal data and external data in the determined sample data into an original network model according to a first preset proportion to obtain a third feature and a fourth feature output by the original network model, wherein the third feature is used for representing a feature mean value of the internal data and the external data, the fourth feature is used for representing a feature variance of the internal data and the external data, the external data is used for representing data of a non-attribution type, and the internal data is used for representing data of an attribution type;

s2, separating the third feature and the fourth feature according to a second preset proportion;

and S3, training the original network model according to the third characteristic and the fourth characteristic to obtain a target network model.

In the present embodiment, the original network model includes, but is not limited to, a CNN network model. And separating the third characteristic and the fourth characteristic according to a second preset proportion, namely tearing the third characteristic and the fourth characteristic.

In an exemplary embodiment, training the original network model according to the third feature and the fourth feature to obtain the target network model includes:

s1, determining a characteristic memory library, wherein the characteristic memory library is used for storing the characteristic distribution of the sample data;

s2, determining a first distance between the third feature and the feature memory library to determine a first probability of the data category to which the internal data belongs;

s3, determining a second distance between the fourth feature and the feature memory base to determine a second probability of the data category to which the external data belongs;

and S4, training the original network model based on the first probability and the second probability to obtain a target network model.

In this embodiment, the feature memory library is a library of sample feature distributions within memory classes, and is updated slowly. Each class maintains a feature memory bank, and is updated by adopting a momentum updating method.

The invention is illustrated below with reference to specific examples:

the problem that this embodiment will solve is on the basis of fixed classification number, when can guaranteeing the classification accuracy, effectively discerns the external data, avoids classifying the external data mistake into known classification.

As shown in fig. 3, it is a training flowchart in this embodiment, and includes the following steps:

s301, inputting the data in the class and the external data according to a certain proportion. The intra-class data is used for representing data in a class with supervised classification, and the external data is used for representing diverse external data which does not belong to any one of the classes. In the aspect of data input, different from a general supervised classification task, external data is added, the introduction of the external data has little influence on the classification probability, but the uncertainty estimation of the external data is obviously improved, that is, the external data is not classified into a certain known class with great probability. The internal data and external data of the input data are sent to the network with a certain ratio (recommendation 1:1) through preprocessing.

S302, after the data is processed by the forward process of the CNN network, a determined N-dimensional feature is obtained, the feature in the embodiment comprises 2N-dimensional features, one feature mean value representing the sample is marked as mu, the other feature variance representing the sample is marked as sigma, and a normal distribution sigma (mu, sigma) is used²) To express the feature, the range of expression of the feature is expanded.

Wherein f (x) is a representation range for representing the feature;

s303, after the characteristics are obtained, the materials are marked as fea (mu, sigma) and torn according to the proportion, and 2 parts of reasons exist in tearing. First, the extrinsic data part does not need to compute the cross-entropy loss because this part does not have the correct class label and cannot contribute the correct loss. Second, the internal data and the external data do not operate the same in the uncertainty estimation section, requiring separate processing.

S304-S306, classification branch, internal data needs to be embodied when used for classification, and formula 2 is adopted, wherein epsilon is equal to N (0, 1), and the introduction of epsilon enables the robustness of the model to be better. After the features are solidified, cross entropy loss is used for classification. The cross entropy loss is denoted L_softmax。

fea＝u+ε×σ (2)；

S307, an uncertainty estimation branch, which is described in detail below according to section 2, is to process the internal data and the external data, respectively.

Firstly, a feature memory library is introduced, wherein the feature memory library is a library for memorizing the feature distribution of the sample in the class and is updated slowly. Each class maintains a distribution

And updating by adopting a momentum updating method, as shown in formulas (3) and (4). Mu.s_m，σ_m0The parameters before updating are shown, which are distribution parameters obtained by all historical data, and new information can be slowly added after updating. Wherein mu_c，σ_cMean of fea (μ, σ) for all samples of this batch.

μ_m＝k×μ_m0+(1-k)×μ_c (3)；

σ_m＝k×σ_m0+(1-k)×σ_c (4)；

Wherein k is a preset parameter less than 1.

For internal data, as shown in FIG. 4, by one L_inTo estimate the distance of the individual sample distributions and the feature memory. The specific implementation is shown in formula (5-7), and the normalized distance is expressed by DM in formula 5. Equation 6 shows the distance measure belonging to the same class in the memory library, and the threshold is determined by referring to the 3 sigma principle of normal distribution. Equation 7 represents a measure of distance from other categories in the memory pool.

For external data, as shown in FIG. 5, by one L_outTo estimate the distance of the individual sample distributions and the feature memory. The specific implementation is shown in formula (8-9), wherein DM is used to represent the normalized distance in formula 8, and formula 9 specifically mainly measures the distance between the extrinsic data and the class center of the feature memory library to ensure that the extrinsic data is outside the 3 σ range.

The integral LOSS is composed of three parts

L＝a×L_softmax+b×L_in+c×L_out (10)。

As shown in fig. 6, it is a test flowchart in this embodiment, and includes the following steps:

s601, inputting test data into a CNN network model;

S602-S603, extracting features of the test data through a CNN network to obtain classification features fea (mu, sigma);

S604-S607, when the classification is carried out, directly taking mu as an output feature and calculating the belonged classification. Assuming that it belongs to class c, the probability of belonging to the current class is calculated using equation 11. If the probability is larger than a threshold value T, the classification is considered to be correct, and if the probability is smaller than T, the classification is considered to belong to uncertain data.

In summary, the present embodiment solves the prediction problem of the out-of-distribution data. The method can correctly predict the class of the data in the class, correctly predict the data outside the class as external data, has low probability of belonging to the class, and obviously reduces the misclassification. The above is achieved by increasing the model tolerance by predicting feature distributions rather than specifically, enhancing the resolution of the model to external data by the uncertainty estimation module and the introduction of external data.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

In this embodiment, a device for determining a data type attribute is further provided, where the device is used to implement the foregoing embodiments and preferred embodiments, and details are not repeated for what has been described. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 7 is a block diagram of a structure of an apparatus for determining a data category attribute according to an embodiment of the present invention, as shown in fig. 7, the apparatus includes:

the first input module 72 is configured to input the target data into the target network model, so as to obtain a first feature and a second feature of the target data output by the target network model, where the first feature is used to represent a feature mean of the target data, and the second feature is used to represent a feature variance of the target data;

a first determining module 74, configured to determine, by using the first feature and the second feature, a target data category to which the target data belongs and a probability that the target data belongs to the target data category;

a second determining module 76, configured to determine a class attribute of the target data based on the target data class to which the target data belongs and the probability that the target data belongs to the target data class.

In an exemplary embodiment, the first determining module includes:

a first input unit configured to input the first feature to a classifier to obtain a class of the target data to which the target data calculated by the classifier belongs, wherein the second feature is also used to indicate an output feature of the classifier;

the first determining unit is used for determining the probability that the target data belongs to the target data category based on the feature mean of the first feature, the feature variance of the second feature, the first parameter and the preset data category.

In an exemplary embodiment, the second determining module includes one of:

a second determining unit, configured to determine that the target data belongs to the target data category when the probability is greater than or equal to a preset threshold;

and a third determining unit, configured to determine that the target data belongs to an abnormal data category when the probability is smaller than the preset threshold.

In an exemplary embodiment, the apparatus further includes:

a second input module, configured to input target data into a target network model, and before obtaining a first feature and a second feature of the target data output by the target network model, input internal data and external data in determined sample data into an original network model according to a first preset ratio, so as to obtain a third feature and a fourth feature output by the original network model, where the third feature is used to represent a feature mean of the internal data and the external data, the fourth feature is used to represent a feature variance of the internal data and the external data, the external data is used to represent data of a non-attribution type, and the internal data is used to represent data of an attribution type;

a first separation module for separating the third feature and the fourth feature according to a second preset ratio;

and the first training module is used for training the original network model according to the third characteristic and the fourth characteristic to obtain the target network model.

In one exemplary embodiment, a first training module includes:

a fourth determining unit, configured to determine a feature memory library, where the feature memory library is used to store feature distribution of the sample data;

a fifth determining unit configured to determine a first distance between the third feature and the feature memory library to determine a first probability of a data type to which the internal data belongs;

a sixth determining unit configured to determine a second distance between the fourth feature and the feature storage library to determine a second probability of the data type to which the external data belongs;

and the first training unit is used for training the original network model based on the first probability and the second probability to obtain the target network model.

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

Embodiments of the present invention also provide a computer-readable storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the steps of any of the above-mentioned method embodiments when executed.

In the present embodiment, the above-described computer-readable storage medium may be configured to store a computer program for executing the above steps.

In an exemplary embodiment, the computer-readable storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.

In an exemplary embodiment, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

In an exemplary embodiment, the processor may be configured to execute the above steps by a computer program.

For specific examples in this embodiment, reference may be made to the examples described in the above embodiments and exemplary embodiments, and details of this embodiment are not repeated herein.

It will be apparent to those skilled in the art that the various modules or steps of the invention described above may be implemented using a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and they may be implemented using program code executable by the computing devices, such that they may be stored in a memory device and executed by the computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into various integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for determining attributes of data categories, comprising:

inputting target data into a target network model to obtain a first feature and a second feature of the target data output by the target network model, wherein the first feature is used for representing a feature mean of the target data, and the second feature is used for representing a feature variance of the target data;

determining a target data category to which the target data belongs and a probability that the target data belongs to the target data category by using the first feature and the second feature;

and determining the class attribute of the target data based on the target data class to which the target data belongs and the probability of the target data belonging to the target data class.

2. The method of claim 1, wherein determining a target data class to which the target data belongs and a probability that the target data belongs to the target data class using the first feature and the second feature comprises:

inputting the first features into a classifier to obtain the target data category to which the target data calculated by the classifier belongs, wherein the second features are also used for representing output features of the classifier;

determining the probability that the target data belongs to the target data category based on the feature mean of the first feature and the feature variance of the second feature, the first parameter and a preset data category.

3. The method of claim 1, wherein determining the class attribute of the target data based on a target data class to which the target data belongs and a probability that the target data belongs to the target data class comprises one of:

determining that the target data belongs to the target data category when the probability is greater than or equal to a preset threshold;

and under the condition that the probability is smaller than the preset threshold value, determining that the target data belongs to an abnormal data category.

4. The method of claim 1, wherein before inputting the target data into the target network model and obtaining the first and second characteristics of the target data output by the target network model, the method further comprises:

inputting internal data and external data in the determined sample data into an original network model according to a first preset proportion to obtain a third feature and a fourth feature output by the original network model, wherein the third feature is used for representing a feature mean value of the internal data and the external data, the fourth feature is used for representing a feature variance of the internal data and the external data, the external data is used for representing data of a non-attribution type, and the internal data is used for representing data of an attribution type;

separating the third feature from the fourth feature according to a second preset ratio;

and training the original network model according to the third characteristic and the fourth characteristic to obtain the target network model.

5. The method of claim 4, wherein training the original network model according to the third feature and the fourth feature to obtain the target network model comprises:

determining a characteristic memory library, wherein the characteristic memory library is used for storing the characteristic distribution of the sample data;

determining a first distance between the third feature and the feature memory to determine a first probability of a data category to which the internal data belongs;

determining a second distance between the fourth feature and the feature memory to determine a second probability of the data category to which the external data belongs;

and training the original network model based on the first probability and the second probability to obtain the target network model.

6. An apparatus for determining a data category attribute, comprising:

the device comprises a first input module, a second input module and a third input module, wherein the first input module is used for inputting target data into a target network model to obtain a first characteristic and a second characteristic of the target data output by the target network model, the first characteristic is used for representing a characteristic mean value of the target data, and the second characteristic is used for representing a characteristic variance of the target data;

a first determining module, configured to determine, by using the first feature and the second feature, a target data category to which the target data belongs and a probability that the target data belongs to the target data category;

and the second determination module is used for determining the class attribute of the target data based on the target data class to which the target data belongs and the probability of the target data belonging to the target data class.

7. The apparatus of claim 6, wherein the first determining module comprises:

a first input unit, configured to input the first feature to a classifier, so as to obtain the target data category to which the target data calculated by the classifier belongs, where the second feature is further used to represent an output feature of the classifier;

a first determining unit, configured to determine a probability that the target data belongs to the target data category based on the feature mean of the first feature and the feature variance of the second feature, a first parameter, and a preset data category.

8. The apparatus of claim 6, wherein the second determining module comprises one of:

and the third determining unit is used for determining that the target data belongs to the abnormal data category under the condition that the probability is smaller than the preset threshold.

9. A computer-readable storage medium, in which a computer program is stored, which computer program, when being executed by a processor, carries out the method of any one of claims 1 to 5.

10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 5.