CN113469265A - Data category attribute determining method and device, storage medium and electronic device - Google Patents

Data category attribute determining method and device, storage medium and electronic device Download PDF

Info

Publication number
CN113469265A
CN113469265A CN202110796429.6A CN202110796429A CN113469265A CN 113469265 A CN113469265 A CN 113469265A CN 202110796429 A CN202110796429 A CN 202110796429A CN 113469265 A CN113469265 A CN 113469265A
Authority
CN
China
Prior art keywords
target data
feature
data
target
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110796429.6A
Other languages
Chinese (zh)
Inventor
郭思郁
王宁波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202110796429.6A priority Critical patent/CN113469265A/en
Publication of CN113469265A publication Critical patent/CN113469265A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a method and a device for determining data category attributes, a storage medium and an electronic device, wherein the method comprises the following steps: inputting target data into a target network model to obtain a first characteristic and a second characteristic of the target data output by the target network model, wherein the first characteristic is used for representing a characteristic mean value of the target data, and the second characteristic is used for representing a characteristic variance of the target data; determining a target data category to which the target data belongs and the probability of the target data belonging to the target data category by using the first characteristic and the second characteristic; and determining the class attribute of the target data based on the target data class to which the target data belongs and the probability of the target data belonging to the target data class. By the method and the device, the problem of inaccurate classification of the external data in the related technology can be solved, and the effect of accurately classifying the data is achieved.

Description

Data category attribute determining method and device, storage medium and electronic device
Technical Field
The embodiment of the invention relates to the field of data processing, in particular to a method and a device for determining data category attributes, a storage medium and an electronic device.
Background
In the real world, almost all classification models face a problem in practical use, external data which does not belong to any known class needs to be processed, and a general classification method forcibly classifies the data into one of the known classes.
The open-set classification problem (open-set problem) not only includes the number categories of 0-9, but also includes other unknown categories such as A-Z and the like, but the unknown categories have no label, and the classifier cannot know the specific categories of the images in the unknown categories, such as: whether or not A, these hundreds of different classes of images together form a class: unknown classes, within the detection we call background classes (background), while the open set classification problem is: distinguish between these 10 classes and reject other unknown classes.
The open set classification method disclosed in the prior art adds many classes during training, but still faces the problem of external data during prediction.
In view of the above technical problems, no effective solution has been proposed in the related art.
Disclosure of Invention
The embodiment of the invention provides a method and a device for determining a data category attribute, a storage medium and an electronic device, which are used for at least solving the problem of inaccurate classification of external data in the related art.
According to an embodiment of the present invention, there is provided a method for determining a data category attribute, including: inputting target data into a target network model to obtain a first feature and a second feature of the target data output by the target network model, wherein the first feature is used for representing a feature mean value of the target data, and the second feature is used for representing a feature variance of the target data; determining a target data category to which the target data belongs and the probability of the target data belonging to the target data category by using the first characteristic and the second characteristic; and determining the class attribute of the target data based on the target data class to which the target data belongs and the probability of the target data belonging to the target data class.
According to another embodiment of the present invention, there is provided an apparatus for determining a data category attribute, including: a first input module, configured to input target data into a target network model, and obtain a first feature and a second feature of the target data output by the target network model, where the first feature is used to represent a feature mean of the target data, and the second feature is used to represent a feature variance of the target data; the first determining module is used for determining a target data category to which the target data belongs and the probability of the target data belonging to the target data category by using the first characteristic and the second characteristic; and the second determining module is used for determining the class attribute of the target data based on the target data class to which the target data belongs and the probability of the target data belonging to the target data class.
In an exemplary embodiment, the first determining module includes: a first input unit configured to input the first feature to a classifier to obtain a class of the target data to which the target data calculated by the classifier belongs, wherein the second feature is also used to indicate an output feature of the classifier; the first determining unit is used for determining the probability that the target data belongs to the target data category based on the feature mean of the first feature, the feature variance of the second feature, the first parameter and the preset data category.
In an exemplary embodiment, the second determining module includes one of: a second determining unit, configured to determine that the target data belongs to the target data category when the probability is greater than or equal to a preset threshold; and a third determining unit, configured to determine that the target data belongs to an abnormal data category when the probability is smaller than the preset threshold.
In an exemplary embodiment, the apparatus further includes: a second input module, configured to input target data into a target network model, and before obtaining a first feature and a second feature of the target data output by the target network model, input internal data and external data in determined sample data into an original network model according to a first preset ratio, so as to obtain a third feature and a fourth feature output by the original network model, where the third feature is used to represent a feature mean of the internal data and the external data, the fourth feature is used to represent a feature variance of the internal data and the external data, the external data is used to represent data of a non-attribution type, and the internal data is used to represent data of an attribution type; a first separation module for separating the third feature and the fourth feature according to a second preset ratio; and the first training module is used for training the original network model according to the third characteristic and the fourth characteristic to obtain the target network model.
In one exemplary embodiment, a first training module includes: a fourth determining unit, configured to determine a feature memory library, where the feature memory library is used to store feature distribution of the sample data; a fifth determining unit configured to determine a first distance between the third feature and the feature memory library to determine a first probability of a data type to which the internal data belongs; a sixth determining unit configured to determine a second distance between the fourth feature and the feature storage library to determine a second probability of the data type to which the external data belongs; and the first training unit is used for training the original network model based on the first probability and the second probability to obtain the target network model.
According to a further embodiment of the present invention, there is also provided a computer-readable storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.
According to the method, the target data are input into the target network model, so that a first characteristic and a second characteristic of the target data output by the target network model are obtained, wherein the first characteristic is used for representing a characteristic mean value of the target data, and the second characteristic is used for representing a characteristic variance of the target data; determining a target data category to which the target data belongs and the probability of the target data belonging to the target data category by using the first characteristic and the second characteristic; and determining the class attribute of the target data based on the target data class to which the target data belongs and the probability of the target data belonging to the target data class. On the basis of fixing the classification number, whether the target data belongs to internal data or external data can be effectively determined, the classification accuracy is guaranteed, and the external data is prevented from being classified into known classes by mistake. Therefore, the problem of inaccurate classification of the external data in the related art can be solved, and the effect of accurately classifying the data is achieved.
Drawings
Fig. 1 is a block diagram of a hardware structure of a mobile terminal of a method for determining a data category attribute according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method of determining attributes of data categories according to an embodiment of the invention;
FIG. 3 is a training flow diagram according to an embodiment of the invention;
FIG. 4 is a flow diagram of internal data processing in the uncertainty estimation module according to an embodiment of the present invention;
FIG. 5 is a flow diagram of the processing of external data in the uncertainty estimation module according to an embodiment of the present invention;
FIG. 6 is a test flow diagram according to an embodiment of the invention;
fig. 7 is a block diagram of the structure of a data category attribute determination apparatus according to an embodiment of the present invention.
Detailed Description
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings in conjunction with the embodiments.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
The method embodiments provided in the embodiments of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking the operation on the mobile terminal as an example, fig. 1 is a hardware structure block diagram of the mobile terminal of the method for determining the data category attribute according to the embodiment of the present invention. As shown in fig. 1, the mobile terminal may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), and a memory 104 for storing data, wherein the mobile terminal may further include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and does not limit the structure of the mobile terminal. For example, the mobile terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store a computer program, for example, a software program and a module of application software, such as a computer program corresponding to the method for determining the data category attribute in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the mobile terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
In this embodiment, a method for determining a data category attribute is provided, and fig. 2 is a flowchart of a method for determining a data category attribute according to an embodiment of the present invention, as shown in fig. 2, the flowchart includes the following steps:
step S202, target data are input into a target network model, and a first feature and a second feature of the target data output by the target network model are obtained, wherein the first feature is used for representing a feature mean value of the target data, and the second feature is used for representing a feature variance of the target data;
step S204, determining a target data category to which the target data belongs and the probability of the target data belonging to the target data category by using the first characteristic and the second characteristic;
step S206, the category attribute of the target data is determined based on the target data category to which the target data belongs and the probability of the target data belonging to the target data category.
The execution subject of the above steps may be a terminal, but is not limited thereto.
The present embodiment includes, but is not limited to, application in a scenario of classifying data, for example, classification of data of a known class, classification of data of an unknown class.
In the present embodiment, the target data includes, but is not limited to, internal data (known category), external data (unknown category). The target Network model includes, but is not limited to, Cellular Neural Network (CNN).
Through the steps, the target data are input into the target network model, and a first characteristic and a second characteristic of the target data output by the target network model are obtained, wherein the first characteristic is used for representing a characteristic mean value of the target data, and the second characteristic is used for representing a characteristic variance of the target data; determining a target data category to which the target data belongs and the probability of the target data belonging to the target data category by using the first characteristic and the second characteristic; and determining the class attribute of the target data based on the target data class to which the target data belongs and the probability of the target data belonging to the target data class. On the basis of fixing the classification number, whether the target data belongs to internal data or external data can be effectively determined, the classification accuracy is guaranteed, and the external data is prevented from being classified into known classes by mistake. Therefore, the problem of inaccurate classification of the external data in the related art can be solved, and the effect of accurately classifying the data is achieved.
In one exemplary embodiment, determining a probability that the target data belongs to the target data class using the first feature and the second feature comprises:
s1, inputting the first features into the classifier to obtain the target data category to which the target data calculated by the classifier belongs, wherein the second features are also used for representing the output features of the classifier;
and S2, determining the probability that the target data belongs to the target data category based on the feature mean of the first feature and the feature variance of the second feature, the first parameter and the preset data category.
In this embodiment, the probability that the target data belongs to the target data category is determined by the following formula:
Figure BDA0003162977940000061
wherein P is used to represent the probability of the target data class, μ is used to represent the feature mean of the first feature, σ is used to represent the feature variance of the second feature, m is used to represent the first parameter, c is used to represent the preset data class, and i is used to represent the order of the feature means of the first feature.
In one exemplary embodiment, determining the class attribute of the target data based on the target data class to which the target data belongs and the probability that the target data belongs to the target data class comprises one of:
s1, determining that the target data belong to the target data category when the probability is greater than or equal to a preset threshold;
and S2, determining that the target data belong to the abnormal data category under the condition that the probability is smaller than the preset threshold value.
In this embodiment, for example, if the probability is greater than the threshold T, the classification is considered to be correct, and if the probability is less than T, the classification is considered to belong to uncertain data.
In an exemplary embodiment, before inputting the target data into the target network model and obtaining the first feature and the second feature of the target data output by the target network model, the method further includes:
s1, inputting internal data and external data in the determined sample data into an original network model according to a first preset proportion to obtain a third feature and a fourth feature output by the original network model, wherein the third feature is used for representing a feature mean value of the internal data and the external data, the fourth feature is used for representing a feature variance of the internal data and the external data, the external data is used for representing data of a non-attribution type, and the internal data is used for representing data of an attribution type;
s2, separating the third feature and the fourth feature according to a second preset proportion;
and S3, training the original network model according to the third characteristic and the fourth characteristic to obtain a target network model.
In the present embodiment, the original network model includes, but is not limited to, a CNN network model. And separating the third characteristic and the fourth characteristic according to a second preset proportion, namely tearing the third characteristic and the fourth characteristic.
In an exemplary embodiment, training the original network model according to the third feature and the fourth feature to obtain the target network model includes:
s1, determining a characteristic memory library, wherein the characteristic memory library is used for storing the characteristic distribution of the sample data;
s2, determining a first distance between the third feature and the feature memory library to determine a first probability of the data category to which the internal data belongs;
s3, determining a second distance between the fourth feature and the feature memory base to determine a second probability of the data category to which the external data belongs;
and S4, training the original network model based on the first probability and the second probability to obtain a target network model.
In this embodiment, the feature memory library is a library of sample feature distributions within memory classes, and is updated slowly. Each class maintains a feature memory bank, and is updated by adopting a momentum updating method.
The invention is illustrated below with reference to specific examples:
the problem that this embodiment will solve is on the basis of fixed classification number, when can guaranteeing the classification accuracy, effectively discerns the external data, avoids classifying the external data mistake into known classification.
As shown in fig. 3, it is a training flowchart in this embodiment, and includes the following steps:
s301, inputting the data in the class and the external data according to a certain proportion. The intra-class data is used for representing data in a class with supervised classification, and the external data is used for representing diverse external data which does not belong to any one of the classes. In the aspect of data input, different from a general supervised classification task, external data is added, the introduction of the external data has little influence on the classification probability, but the uncertainty estimation of the external data is obviously improved, that is, the external data is not classified into a certain known class with great probability. The internal data and external data of the input data are sent to the network with a certain ratio (recommendation 1:1) through preprocessing.
S302, after the data is processed by the forward process of the CNN network, a determined N-dimensional feature is obtained, the feature in the embodiment comprises 2N-dimensional features, one feature mean value representing the sample is marked as mu, the other feature variance representing the sample is marked as sigma, and a normal distribution sigma (mu, sigma) is used2) To express the feature, the range of expression of the feature is expanded.
Figure BDA0003162977940000081
Wherein f (x) is a representation range for representing the feature;
s303, after the characteristics are obtained, the materials are marked as fea (mu, sigma) and torn according to the proportion, and 2 parts of reasons exist in tearing. First, the extrinsic data part does not need to compute the cross-entropy loss because this part does not have the correct class label and cannot contribute the correct loss. Second, the internal data and the external data do not operate the same in the uncertainty estimation section, requiring separate processing.
S304-S306, classification branch, internal data needs to be embodied when used for classification, and formula 2 is adopted, wherein epsilon is equal to N (0, 1), and the introduction of epsilon enables the robustness of the model to be better. After the features are solidified, cross entropy loss is used for classification. The cross entropy loss is denoted Lsoftmax
fea=u+ε×σ (2);
S307, an uncertainty estimation branch, which is described in detail below according to section 2, is to process the internal data and the external data, respectively.
Firstly, a feature memory library is introduced, wherein the feature memory library is a library for memorizing the feature distribution of the sample in the class and is updated slowly. Each class maintains a distribution
Figure BDA0003162977940000095
And updating by adopting a momentum updating method, as shown in formulas (3) and (4). Mu.sm,σm0The parameters before updating are shown, which are distribution parameters obtained by all historical data, and new information can be slowly added after updating. Wherein muc,σcMean of fea (μ, σ) for all samples of this batch.
μm=k×μm0+(1-k)×μc (3);
σm=k×σm0+(1-k)×σc (4);
Wherein k is a preset parameter less than 1.
For internal data, as shown in FIG. 4, by one LinTo estimate the distance of the individual sample distributions and the feature memory. The specific implementation is shown in formula (5-7), and the normalized distance is expressed by DM in formula 5. Equation 6 shows the distance measure belonging to the same class in the memory library, and the threshold is determined by referring to the 3 sigma principle of normal distribution. Equation 7 represents a measure of distance from other categories in the memory pool.
Figure BDA0003162977940000091
Figure BDA0003162977940000092
Figure BDA0003162977940000093
For external data, as shown in FIG. 5, by one LoutTo estimate the distance of the individual sample distributions and the feature memory. The specific implementation is shown in formula (8-9), wherein DM is used to represent the normalized distance in formula 8, and formula 9 specifically mainly measures the distance between the extrinsic data and the class center of the feature memory library to ensure that the extrinsic data is outside the 3 σ range.
Figure BDA0003162977940000094
Figure BDA0003162977940000101
The integral LOSS is composed of three parts
L=a×Lsoftmax+b×Lin+c×Lout (10)。
As shown in fig. 6, it is a test flowchart in this embodiment, and includes the following steps:
s601, inputting test data into a CNN network model;
S602-S603, extracting features of the test data through a CNN network to obtain classification features fea (mu, sigma);
S604-S607, when the classification is carried out, directly taking mu as an output feature and calculating the belonged classification. Assuming that it belongs to class c, the probability of belonging to the current class is calculated using equation 11. If the probability is larger than a threshold value T, the classification is considered to be correct, and if the probability is smaller than T, the classification is considered to belong to uncertain data.
Figure BDA0003162977940000102
In summary, the present embodiment solves the prediction problem of the out-of-distribution data. The method can correctly predict the class of the data in the class, correctly predict the data outside the class as external data, has low probability of belonging to the class, and obviously reduces the misclassification. The above is achieved by increasing the model tolerance by predicting feature distributions rather than specifically, enhancing the resolution of the model to external data by the uncertainty estimation module and the introduction of external data.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
In this embodiment, a device for determining a data type attribute is further provided, where the device is used to implement the foregoing embodiments and preferred embodiments, and details are not repeated for what has been described. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 7 is a block diagram of a structure of an apparatus for determining a data category attribute according to an embodiment of the present invention, as shown in fig. 7, the apparatus includes:
the first input module 72 is configured to input the target data into the target network model, so as to obtain a first feature and a second feature of the target data output by the target network model, where the first feature is used to represent a feature mean of the target data, and the second feature is used to represent a feature variance of the target data;
a first determining module 74, configured to determine, by using the first feature and the second feature, a target data category to which the target data belongs and a probability that the target data belongs to the target data category;
a second determining module 76, configured to determine a class attribute of the target data based on the target data class to which the target data belongs and the probability that the target data belongs to the target data class.
In an exemplary embodiment, the first determining module includes:
a first input unit configured to input the first feature to a classifier to obtain a class of the target data to which the target data calculated by the classifier belongs, wherein the second feature is also used to indicate an output feature of the classifier;
the first determining unit is used for determining the probability that the target data belongs to the target data category based on the feature mean of the first feature, the feature variance of the second feature, the first parameter and the preset data category.
In an exemplary embodiment, the second determining module includes one of:
a second determining unit, configured to determine that the target data belongs to the target data category when the probability is greater than or equal to a preset threshold;
and a third determining unit, configured to determine that the target data belongs to an abnormal data category when the probability is smaller than the preset threshold.
In an exemplary embodiment, the apparatus further includes:
a second input module, configured to input target data into a target network model, and before obtaining a first feature and a second feature of the target data output by the target network model, input internal data and external data in determined sample data into an original network model according to a first preset ratio, so as to obtain a third feature and a fourth feature output by the original network model, where the third feature is used to represent a feature mean of the internal data and the external data, the fourth feature is used to represent a feature variance of the internal data and the external data, the external data is used to represent data of a non-attribution type, and the internal data is used to represent data of an attribution type;
a first separation module for separating the third feature and the fourth feature according to a second preset ratio;
and the first training module is used for training the original network model according to the third characteristic and the fourth characteristic to obtain the target network model.
In one exemplary embodiment, a first training module includes:
a fourth determining unit, configured to determine a feature memory library, where the feature memory library is used to store feature distribution of the sample data;
a fifth determining unit configured to determine a first distance between the third feature and the feature memory library to determine a first probability of a data type to which the internal data belongs;
a sixth determining unit configured to determine a second distance between the fourth feature and the feature storage library to determine a second probability of the data type to which the external data belongs;
and the first training unit is used for training the original network model based on the first probability and the second probability to obtain the target network model.
It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.
Embodiments of the present invention also provide a computer-readable storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the steps of any of the above-mentioned method embodiments when executed.
In the present embodiment, the above-described computer-readable storage medium may be configured to store a computer program for executing the above steps.
In an exemplary embodiment, the computer-readable storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
In an exemplary embodiment, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
In an exemplary embodiment, the processor may be configured to execute the above steps by a computer program.
For specific examples in this embodiment, reference may be made to the examples described in the above embodiments and exemplary embodiments, and details of this embodiment are not repeated herein.
It will be apparent to those skilled in the art that the various modules or steps of the invention described above may be implemented using a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and they may be implemented using program code executable by the computing devices, such that they may be stored in a memory device and executed by the computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into various integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for determining attributes of data categories, comprising:
inputting target data into a target network model to obtain a first feature and a second feature of the target data output by the target network model, wherein the first feature is used for representing a feature mean of the target data, and the second feature is used for representing a feature variance of the target data;
determining a target data category to which the target data belongs and a probability that the target data belongs to the target data category by using the first feature and the second feature;
and determining the class attribute of the target data based on the target data class to which the target data belongs and the probability of the target data belonging to the target data class.
2. The method of claim 1, wherein determining a target data class to which the target data belongs and a probability that the target data belongs to the target data class using the first feature and the second feature comprises:
inputting the first features into a classifier to obtain the target data category to which the target data calculated by the classifier belongs, wherein the second features are also used for representing output features of the classifier;
determining the probability that the target data belongs to the target data category based on the feature mean of the first feature and the feature variance of the second feature, the first parameter and a preset data category.
3. The method of claim 1, wherein determining the class attribute of the target data based on a target data class to which the target data belongs and a probability that the target data belongs to the target data class comprises one of:
determining that the target data belongs to the target data category when the probability is greater than or equal to a preset threshold;
and under the condition that the probability is smaller than the preset threshold value, determining that the target data belongs to an abnormal data category.
4. The method of claim 1, wherein before inputting the target data into the target network model and obtaining the first and second characteristics of the target data output by the target network model, the method further comprises:
inputting internal data and external data in the determined sample data into an original network model according to a first preset proportion to obtain a third feature and a fourth feature output by the original network model, wherein the third feature is used for representing a feature mean value of the internal data and the external data, the fourth feature is used for representing a feature variance of the internal data and the external data, the external data is used for representing data of a non-attribution type, and the internal data is used for representing data of an attribution type;
separating the third feature from the fourth feature according to a second preset ratio;
and training the original network model according to the third characteristic and the fourth characteristic to obtain the target network model.
5. The method of claim 4, wherein training the original network model according to the third feature and the fourth feature to obtain the target network model comprises:
determining a characteristic memory library, wherein the characteristic memory library is used for storing the characteristic distribution of the sample data;
determining a first distance between the third feature and the feature memory to determine a first probability of a data category to which the internal data belongs;
determining a second distance between the fourth feature and the feature memory to determine a second probability of the data category to which the external data belongs;
and training the original network model based on the first probability and the second probability to obtain the target network model.
6. An apparatus for determining a data category attribute, comprising:
the device comprises a first input module, a second input module and a third input module, wherein the first input module is used for inputting target data into a target network model to obtain a first characteristic and a second characteristic of the target data output by the target network model, the first characteristic is used for representing a characteristic mean value of the target data, and the second characteristic is used for representing a characteristic variance of the target data;
a first determining module, configured to determine, by using the first feature and the second feature, a target data category to which the target data belongs and a probability that the target data belongs to the target data category;
and the second determination module is used for determining the class attribute of the target data based on the target data class to which the target data belongs and the probability of the target data belonging to the target data class.
7. The apparatus of claim 6, wherein the first determining module comprises:
a first input unit, configured to input the first feature to a classifier, so as to obtain the target data category to which the target data calculated by the classifier belongs, where the second feature is further used to represent an output feature of the classifier;
a first determining unit, configured to determine a probability that the target data belongs to the target data category based on the feature mean of the first feature and the feature variance of the second feature, a first parameter, and a preset data category.
8. The apparatus of claim 6, wherein the second determining module comprises one of:
a second determining unit, configured to determine that the target data belongs to the target data category when the probability is greater than or equal to a preset threshold;
and the third determining unit is used for determining that the target data belongs to the abnormal data category under the condition that the probability is smaller than the preset threshold.
9. A computer-readable storage medium, in which a computer program is stored, which computer program, when being executed by a processor, carries out the method of any one of claims 1 to 5.
10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 5.
CN202110796429.6A 2021-07-14 2021-07-14 Data category attribute determining method and device, storage medium and electronic device Pending CN113469265A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110796429.6A CN113469265A (en) 2021-07-14 2021-07-14 Data category attribute determining method and device, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110796429.6A CN113469265A (en) 2021-07-14 2021-07-14 Data category attribute determining method and device, storage medium and electronic device

Publications (1)

Publication Number Publication Date
CN113469265A true CN113469265A (en) 2021-10-01

Family

ID=77878459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110796429.6A Pending CN113469265A (en) 2021-07-14 2021-07-14 Data category attribute determining method and device, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN113469265A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110313856A1 (en) * 2008-08-18 2011-12-22 Ipharro Media Gmbh Supplemental information delivery
CN110750665A (en) * 2019-10-12 2020-02-04 南京邮电大学 Open set domain adaptation method and system based on entropy minimization
CN110837889A (en) * 2018-08-15 2020-02-25 新智数字科技有限公司 Neural network training method and device, storage medium and electronic device
CN111046933A (en) * 2019-12-03 2020-04-21 东软集团股份有限公司 Image classification method and device, storage medium and electronic equipment
CN111144482A (en) * 2019-12-26 2020-05-12 惠州市锦好医疗科技股份有限公司 Scene matching method and device for digital hearing aid and computer equipment
CN111368893A (en) * 2020-02-27 2020-07-03 Oppo广东移动通信有限公司 Image recognition method and device, electronic equipment and storage medium
CN111523621A (en) * 2020-07-03 2020-08-11 腾讯科技(深圳)有限公司 Image recognition method and device, computer equipment and storage medium
CN112348203A (en) * 2020-11-05 2021-02-09 中国平安人寿保险股份有限公司 Model training method and device, terminal device and storage medium
CN112446428A (en) * 2020-11-27 2021-03-05 杭州海康威视数字技术股份有限公司 Image data processing method and device
CN112579819A (en) * 2020-12-25 2021-03-30 天津车之家数据信息技术有限公司 Data classification method and computing equipment
CN113076994A (en) * 2021-03-31 2021-07-06 南京邮电大学 Open-set domain self-adaptive image classification method and system

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110313856A1 (en) * 2008-08-18 2011-12-22 Ipharro Media Gmbh Supplemental information delivery
CN110837889A (en) * 2018-08-15 2020-02-25 新智数字科技有限公司 Neural network training method and device, storage medium and electronic device
CN110750665A (en) * 2019-10-12 2020-02-04 南京邮电大学 Open set domain adaptation method and system based on entropy minimization
CN111046933A (en) * 2019-12-03 2020-04-21 东软集团股份有限公司 Image classification method and device, storage medium and electronic equipment
CN111144482A (en) * 2019-12-26 2020-05-12 惠州市锦好医疗科技股份有限公司 Scene matching method and device for digital hearing aid and computer equipment
CN111368893A (en) * 2020-02-27 2020-07-03 Oppo广东移动通信有限公司 Image recognition method and device, electronic equipment and storage medium
CN111523621A (en) * 2020-07-03 2020-08-11 腾讯科技(深圳)有限公司 Image recognition method and device, computer equipment and storage medium
CN112348203A (en) * 2020-11-05 2021-02-09 中国平安人寿保险股份有限公司 Model training method and device, terminal device and storage medium
CN112446428A (en) * 2020-11-27 2021-03-05 杭州海康威视数字技术股份有限公司 Image data processing method and device
CN112579819A (en) * 2020-12-25 2021-03-30 天津车之家数据信息技术有限公司 Data classification method and computing equipment
CN113076994A (en) * 2021-03-31 2021-07-06 南京邮电大学 Open-set domain self-adaptive image classification method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HAIGUANG LI 等: "Learning the information diffusion probabilities by using variance regularized EM algorithm", 《ASONAM \'14: PROCEEDINGS OF THE 2014 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING》, 31 August 2018 (2018-08-31), pages 273 *
田垚 等: "基于深度神经网络和Bottleneck特征的说话人识别系统", 《清华大学学报(自然科学版)》, 31 December 2012 (2012-12-31), pages 1 - 6 *

Similar Documents

Publication Publication Date Title
CN111444966B (en) Media information classification method and device
CN110472675B (en) Image classification method, image classification device, storage medium and electronic equipment
CN107040397B (en) Service parameter acquisition method and device
CN112989035B (en) Method, device and storage medium for identifying user intention based on text classification
CN109561322A (en) A kind of method, apparatus, equipment and the storage medium of video audit
CN111797320B (en) Data processing method, device, equipment and storage medium
CN113438114B (en) Method, device, equipment and storage medium for monitoring running state of Internet system
CN113822366A (en) Service index abnormality detection method and device, electronic equipment and storage medium
CN111159481B (en) Edge prediction method and device for graph data and terminal equipment
CN112702339A (en) Abnormal traffic monitoring and analyzing method and device based on deep migration learning
CN115860836A (en) E-commerce service pushing method and system based on user behavior big data analysis
CN111488939A (en) Model training method, classification method, device and equipment
CN112269937B (en) Method, system and device for calculating user similarity
CN113282920B (en) Log abnormality detection method, device, computer equipment and storage medium
CN112115996B (en) Image data processing method, device, equipment and storage medium
CN113010785A (en) User recommendation method and device
CN111507850A (en) Authority guaranteeing method and related device and equipment
CN114170484B (en) Picture attribute prediction method and device, electronic equipment and storage medium
CN113469265A (en) Data category attribute determining method and device, storage medium and electronic device
CN115099934A (en) High-latency customer identification method, electronic equipment and storage medium
CN111723872B (en) Pedestrian attribute identification method and device, storage medium and electronic device
CN114610590A (en) Method, device and equipment for determining operation time length and storage medium
CN113936157A (en) Abnormal information processing method and device, storage medium and electronic device
CN113535458A (en) Abnormal false alarm processing method and device, storage medium and terminal
CN112214675A (en) Method, device and equipment for determining user machine purchasing and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination