CN113378963A - Hybrid framework-based imbalance classification method, system, equipment and storage medium - Google Patents

Hybrid framework-based imbalance classification method, system, equipment and storage medium Download PDF

Info

Publication number
CN113378963A
CN113378963A CN202110708211.0A CN202110708211A CN113378963A CN 113378963 A CN113378963 A CN 113378963A CN 202110708211 A CN202110708211 A CN 202110708211A CN 113378963 A CN113378963 A CN 113378963A
Authority
CN
China
Prior art keywords
data set
majority
minority
data
random
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110708211.0A
Other languages
Chinese (zh)
Other versions
CN113378963B (en
Inventor
郭得科
陈锐
罗来龙
陈颖文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202110708211.0A priority Critical patent/CN113378963B/en
Publication of CN113378963A publication Critical patent/CN113378963A/en
Application granted granted Critical
Publication of CN113378963B publication Critical patent/CN113378963B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a method, a system, equipment and a storage medium for unbalanced classification based on a hybrid framework. The method is used for verifying an integrated model of our hybrid resampling through an unbalanced network anomaly detection dataset. The processing speed is increased by proposing a combination of resampling methods to reduce the number of majority classes. And processing the imbalanced data sets at the data level and converting the data sets into an equalized distribution using a resampling technique. By building an integrated model containing 12 different classifiers, they provided more options than the 5 classifiers in the previous work. The slightly equalized data obtained after the above processing is classified by using an integration model, and therefore, by proposing a novel combination of undersampling and oversampling, the imbalance between different data classes is equalized and the processing speed is accelerated with less memory overhead.

Description

Hybrid framework-based imbalance classification method, system, equipment and storage medium
Technical Field
The present application relates to the field of data processing, and in particular, to a method, a system, a device, and a storage medium for classifying imbalances based on a hybrid framework.
Background
In the current big data era, data mining and analysis have gained increasing importance in effective decision making. Among various data mining techniques, classification analysis is one of the most widely used techniques, and is applicable to various business and engineering problems, such as cancer prediction, runoff prediction, fraud detection, face detection, fraud detection, and the like. Classification analysis is a supervised classifier learning problem for predicting variables that are composed of a finite number of classes. Typically, classifier learning methods are intended for use with reasonably balanced data sets. However, in many practical cases, the data sets tend to be unbalanced.
Currently, there are two main approaches to solve the unbalanced classification problem: oversampling may randomly generate multiple copies of an existing item to expand a few classes, and undersampling may randomly select a subset of the existing item to reduce the size of a majority of the classes. However, we believe that using only over-sampling or under-sampling strategies may not be sufficient to adequately mitigate the imbalance problem of the data set. First, if only the oversampling method is used to increase the number of the few classes, it is impractical to expand the few classes to have the same amount of data as the majority classes in terms of time consumption and training cost. Second, if only the undersampling method is used to scale down most categories, the large reduction in data sets may lead to inadequate training results. The generated model may not be able to distinguish between these classes in the test dataset. Finally, there are indeed some work that mentions the mixed sampling method, but this method is not explicitly described. Therefore, there is a need in the society for a hybrid sampling method that combines over-sampling and under-sampling strategies.
Data classification is a common data processing method in the field of networks and distributed systems, and has attracted much attention in recent years. However, existing classification algorithms are primarily directed to relatively balanced data sets, but data in reality often exhibit unbalanced characteristics.
Disclosure of Invention
In view of the above, it is necessary to provide a method, a system, an apparatus and a storage medium for classifying imbalances based on a hybrid framework.
In a first aspect, an embodiment of the present invention provides a method for classifying imbalances based on a hybrid frame, including the following steps:
obtaining a training data set D comprising a plurality of classesmajorityAnd a few classes of training data sets DminorityGiven initial data set D;
eliminating data samples of majority categories in the initial data set D by a random undersampling method, generating a new majority category data set, and adopting Dmajority_reducedThe dataset represents the reduced subset;
increasing data samples of minority classes in the initial data set D by a random oversampling method, and generating a new data set of minority classes, and adopting Dminority_increasedThe dataset represents the augmented subset;
will Dmajority_reducedData set and Dmajority_reducedAnd combining the data sets to generate a new mixed data set D ', and training the integrated model of the mixed data set D' through 12 classifiers to obtain the classification result of the initial data set.
Further, the method eliminates the data samples of the majority category in the initial data set D by the random undersampling method, and generates a new data set of the majority category, and adopts Dmajority_reducedThe data set represents a reduced subset including,
selecting samples from a plurality of data sets through random undersampling, and determining the proportion of sample category selection through a preset category distribution threshold;
according to the reduction of the number of most of the data sets, the relatively fast data classification processing process is realized by using less memory;
and analyzing the influence of the unbalanced network anomaly detection data set on the integrated classification performance through random undersampling in different proportions.
Further, the method increases the data samples of the minority class in the initial data set D by the random oversampling method and generates a new data set of the minority class, and D is adoptedminority_increasedThe data set represents the augmented subset of data sets, including,
random copying is performed on the data samples of the minority class by a random oversampling method, so that the number of the samples of the minority class is increased;
by randomly controlling the sampling ratio, quantitative differences between the data samples of the minority class are balanced. In another aspect, the present invention provides a hybrid frame based imbalance classification system comprising
An initial data set giving module for obtaining a training data set D containing a plurality of categoriesmajorityAnd a few classes of training data sets DminorityGiven initial data set D;
an under-sampling module for eliminating the data samples of multiple categories in the initial data set D by random under-sampling method and generating a new data set of multiple categories and adopting Dmajority_reducedThe dataset represents the reduced subset;
an oversampling module for increasing the data samples of the minority class in the initial data set D by a random oversampling method and generating a new data set of the minority class, and adopting Dminority_increasedThe dataset represents the augmented subset;
model blending Module, blending Dmajority_reducedData set and Dmajority_reducedAnd combining the data sets to generate a new mixed data set D ', and training the integrated model of the mixed data set D' through 12 classifiers to obtain the classification result of the initial data set.
Further, the undersampling module comprises a sample reduction unit configured to:
selecting samples from a plurality of data sets through random undersampling, and determining the proportion of sample category selection through a preset category distribution threshold;
according to the reduction of the number of most of the data sets, the relatively fast data classification processing process is realized by using less memory;
and analyzing the influence of the unbalanced network anomaly detection data set on the integrated classification performance through random undersampling in different proportions.
Further, the oversampling module includes a sample addition unit configured to:
random copying is performed on the data samples of the minority class by a random oversampling method, so that the number of the samples of the minority class is increased;
by randomly controlling the sampling ratio, quantitative differences between the data samples of the minority class are balanced.
In another aspect, the present invention further provides a system for classifying imbalances based on a hybrid framework, comprising:
the proposed model is used to solve the practical problem in network anomaly detection. The unbalanced network anomaly detection dataset is used to validate our HRE model. Furthermore, we propose a combination of resampling methods to reduce the number of majority classes, thereby speeding up the processing. We process the unbalanced data set at the data level and convert the data set to an equalized distribution using a resampling technique.
Further, details of the HRE model framework for imbalance classification are included. Given that extending a few classes with an oversampling strategy increases training costs, we have specified only one integration framework that uses an undersampling strategy. We have hereafter proposed a hybrid integration framework that combines oversampling and undersampling to balance the classes in the dataset.
Further, including, in this model, we use random undersampling to reduce the number of majority classes. It also allows faster processing speeds with less memory overhead. Furthermore, multiple classifiers have proven to be more accurate than a single classifier. Therefore, we have chosen 12 classifiers in the integration method, which provides more choices than the 5 classifiers in the previous work.
The embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the following steps are implemented:
obtaining a training data set D comprising a plurality of classesmajorityAnd a few classes of training data sets DminorityGiven initial data set D;
elimination of majority classes in the initial data set D by a random undersampling methodOther data samples and generating a new majority class data set using Dmajority_reducedThe dataset represents the reduced subset;
increasing data samples of minority classes in the initial data set D by a random oversampling method, and generating a new data set of minority classes, and adopting Dminority_increasedThe dataset represents the augmented subset;
will Dmajority_reducedData set and Dmajority_reducedAnd combining the data sets to generate a new mixed data set D ', and training the integrated model of the mixed data set D' through 12 classifiers to obtain the classification result of the initial data set.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the following steps:
obtaining a training data set D comprising a plurality of classesmajorityAnd a few classes of training data sets DminorityGiven initial data set D;
eliminating data samples of majority categories in the initial data set D by a random undersampling method, generating a new majority category data set, and adopting Dmajority_reducedThe dataset represents the reduced subset;
increasing data samples of minority classes in the initial data set D by a random oversampling method, and generating a new data set of minority classes, and adopting Dminority_increasedThe dataset represents the augmented subset;
will Dmajority_reducedData set and Dmajority_reducedAnd combining the data sets to generate a new mixed data set D ', and training the integrated model of the mixed data set D' through 12 classifiers to obtain the classification result of the initial data set.
The beneficial effect that this application brought is: the embodiment of the invention discloses an unbalanced classification method, a system, equipment and a storage medium based on a hybrid framework. The model through mixed sampling is used for solving the practical problem in network anomaly detection, and the unbalanced network anomaly detection data set is used for verifying the integrated model of mixed resampling. In addition, the processing speed is increased by proposing a combination of resampling methods to reduce the number of majority classes. And processing the imbalanced data sets at the data level and converting the data sets into an equalized distribution using a resampling technique. Furthermore, by building an integrated model containing 12 different classifiers, they provided more options than the 5 classifiers in the previous work. The slightly equalized data obtained after the above processing is then classified by using an integration model, thus equalizing the imbalance between different data classes and speeding up the processing with less memory overhead by proposing a novel combination of undersampling and oversampling.
Drawings
FIG. 1 is a flow diagram illustrating a hybrid framework based imbalance classification method according to one embodiment;
FIG. 2 is a flow diagram that illustrates reduction of majority class datasets by an undersampling method, under an embodiment;
FIG. 3 is a flow diagram illustrating the addition of a minority class data set by an over-sampling method in one embodiment;
FIG. 4 is a block diagram of a hybrid framework based imbalance classification system in one embodiment;
FIG. 5 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In one embodiment, as shown in fig. 1, the present embodiment provides a method for classifying imbalances based on a hybrid frame, comprising the following steps:
step 101, obtaining a training data set D comprising a plurality of categoriesmajorityAnd a few classes of training data sets DminorityGiven initial data set D;
step (ii) of102, eliminating the data samples of the majority category in the initial data set D by a random undersampling method, and generating a new data set of the majority category, and adopting Dmajority_reducedThe dataset represents the reduced subset;
103, increasing the data samples of the minority class in the initial data set D by a random oversampling method, and generating a new data set of the minority class by using Dminority_increasedThe dataset represents the augmented subset;
step 104, adding Dmajority_reducedData set and Dmajority_reducedAnd combining the data sets to generate a new mixed data set D ', and training the integrated model of the mixed data set D' through 12 classifiers to obtain the classification result of the initial data set.
Specifically, in this embodiment, we consider the problem of classification imbalance in practical applications. We propose a new hybrid resampling-based integration framework that takes full advantage of the undersampling and oversampling strategies. The model through mixed sampling is used for solving the practical problem in network anomaly detection, and the unbalanced network anomaly detection data set is used for verifying the integrated model of mixed resampling. In addition, the processing speed is increased by proposing a combination of resampling methods to reduce the number of majority classes. And processing the imbalanced data sets at the data level and converting the data sets into an equalized distribution using a resampling technique. Furthermore, by building an integrated model containing 12 different classifiers, they provided more options than the 5 classifiers in the previous work. The slightly equalized data obtained after the above processing is then classified by using an integration model, thus equalizing the imbalance between different data classes and speeding up the processing with less memory overhead by proposing a novel combination of undersampling and oversampling. Experimental results of the examples show that integration of mixed resampling can significantly improve classification accuracy while reducing computational overhead.
In one embodiment, as shown in fig. 2, the process of reducing the majority class data set by the under-sampling method comprises:
step 201, connecting sentences in the evidence set into a sequence evidence text, and then connecting the sequence evidence text with the statement to form an input sequence;
step 202, selecting samples from a plurality of types of data sets through random undersampling, and determining the proportion of selecting the sample types through a preset type distribution threshold;
step 203, using less memory to realize relatively fast data classification processing process according to the reduced number of most types of data sets;
and 204, analyzing the influence of the unbalanced network anomaly detection data set on the integrated classification performance through random undersampling in different proportions.
Specifically, by analyzing the influence of the sampling rate (imbalance rate) on the classification performance in the present embodiment, we obtain the following: first, as the number of majority class samples increases, the performance of the integrated model increases and then decreases while keeping the number of minority class samples constant. Such experimental results also show that an increase in the data imbalance ratio has an effect on the performance of the classifier. Further, when the unbalance ratio is small, the influence on the classification performance is not significant. When the undersampling ratio is 1: 4, the average accuracy will be relatively high. If the imbalance ratio is large, the performance of the classifier will be negatively affected.
In one embodiment, as shown in fig. 3, the process of adding the minority class data set by the oversampling method includes:
step 301, performing random copy on the data samples of the minority class by using a random oversampling method, thereby increasing the number of the samples of the minority class;
in step 302, the quantitative differences between the data samples of the minority class are balanced by randomly controlling the sampling ratio.
Specifically, the mixed resampling is to reduce the number of minority classes to a certain number and then increase the number of cases in the minority classes to match the number with the majority classes on the basis of the pass oversampling. We gradually reduce the imbalance rate by mixing resampling. The results show that as the number of the minority samples gradually increases, the accuracy of the classifier improves from 0.8393 to 0.8981 as a whole, which indicates that the performance of the model gradually improves. Experimental results show that increasing the number of minority categories and decreasing the number of majority categories can improve accuracy when the number of minority categories is small. This also means that a combination of over-sampling and under-sampling methods is effective.
It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in the above-described flowcharts may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or the stages is not necessarily sequential, but may be performed alternately or alternatingly with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 4, there is provided a hybrid frame-based imbalance classification system, comprising:
the initial data set giving module 401 obtains a training data set D containing a plurality of categoriesmajorityAnd a few classes of training data sets DminorityGiven initial data set D;
the under-sampling module 402 eliminates the majority class data samples in the initial data set D by a random under-sampling method, and generates a new majority class data set, using Dmajority_reducedThe dataset represents the reduced subset;
the oversampling module 403 adds the data samples of the minority class in the initial data set D by the random oversampling method and generates a new data set of the minority class, and adopts Dminority_increasedThe dataset represents the augmented subset;
model blending module 404 blending Dmajority_reducedData set and Dmajority_reducedAnd combining the data sets to generate a new mixed data set D ', and training the integrated model of the mixed data set D' through 12 classifiers to obtain the classification result of the initial data set.
In one embodiment, the undersampling module 402 includes a sample reduction unit to:
selecting samples from a plurality of data sets through random undersampling, and determining the proportion of sample category selection through a preset category distribution threshold;
according to the reduction of the number of most of the data sets, the relatively fast data classification processing process is realized by using less memory;
and analyzing the influence of the unbalanced network anomaly detection data set on the integrated classification performance through random undersampling in different proportions.
In one embodiment, the oversampling module 402 includes a sample increment unit to:
random copying is performed on the data samples of the minority class by a random oversampling method, so that the number of the samples of the minority class is increased;
by randomly controlling the sampling ratio, quantitative differences between the data samples of the minority class are balanced.
For specific limitations of the hybrid frame-based imbalance classification system, reference may be made to the above limitations of the hybrid frame-based imbalance classification method, which are not described herein again. The various modules in the hybrid framework-based imbalance classification system described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
FIG. 5 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device comprises a processor, a memory, a network interface, an input device and a display screen which are connected through a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement the method of privilege anomaly detection. The internal memory may also have a computer program stored therein, which when executed by the processor, causes the processor to perform the method for detecting an abnormality of authority. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, as shown in fig. 5, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
obtaining a training data set D comprising a plurality of classesmajorityAnd a few classes of training data sets DminorityGiven initial data set D;
eliminating data samples of majority categories in the initial data set D by a random undersampling method, generating a new majority category data set, and adopting Dmajority_reducedThe dataset represents the reduced subset;
increasing data samples of minority classes in the initial data set D by a random oversampling method, and generating a new data set of minority classes, and adopting Dminority_increasedThe dataset represents the augmented subset;
will Dmajority_reducedData set and Dmajority_reducedAnd combining the data sets to generate a new mixed data set D ', and training the integrated model of the mixed data set D' through 12 classifiers to obtain the classification result of the initial data set.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
obtaining a training data set D comprising a plurality of classesmajorityAnd a few classes of training data sets DminorityGiven initial data set D;
eliminating data samples of majority categories in the initial data set D by a random undersampling method, generating a new majority category data set, and adopting Dmajority_reducedThe dataset represents the reduced subset;
increasing data samples of minority classes in the initial data set D by a random oversampling method, and generating a new data set of minority classes, and adopting Dminority_increasedThe dataset represents the augmented subset;
will Dmajority_reducedData set and Dmajority_reducedAnd combining the data sets to generate a new mixed data set D ', and training the integrated model of the mixed data set D' through 12 classifiers to obtain the classification result of the initial data set.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (8)

1. A method for classifying imbalances based on a hybrid framework is characterized by comprising the following steps:
obtaining a training data set D comprising a plurality of classesmajorityAnd a few classes of training data sets DminorityGiven initial data set D;
eliminating data samples of majority categories in the initial data set D by a random undersampling method, generating a new majority category data set, and adopting Dmajority_reducedThe dataset represents the reduced subset;
increasing data samples of minority classes in the initial data set D by a random oversampling method, and generating a new data set of minority classes, and adopting Dminority_increasedThe dataset represents the augmented subset;
will Dmajority_reducedData set and Dmajority_reducedAnd combining the data sets to generate a new mixed data set D ', and training the integrated model of the mixed data set D' through 12 classifiers to obtain the classification result of the initial data set.
2. The hybrid-framework-based imbalance classification method according to claim 1, wherein the majority class data samples in the initial data set D are eliminated through a random undersampling method, and a new majority class data set is generated, and D is adoptedmajority_reducedThe data set represents a reduced subset including,
selecting samples from a plurality of data sets through random undersampling, and determining the proportion of sample category selection through a preset category distribution threshold;
according to the reduction of the number of most of the data sets, the relatively fast data classification processing process is realized by using less memory;
and analyzing the influence of the unbalanced network anomaly detection data set on the integrated classification performance through random undersampling in different proportions.
3. The base of claim 1The method for classifying imbalances in a hybrid framework is characterized in that a random oversampling method is used for increasing data samples of minority classes in an initial data set D and generating a new data set of minority classes, and D is used forminority_increasedThe data set represents the augmented subset of data sets, including,
random copying is performed on the data samples of the minority class by a random oversampling method, so that the number of the samples of the minority class is increased;
by randomly controlling the sampling ratio, quantitative differences between the data samples of the minority class are balanced.
4. A hybrid frame-based imbalance classification system, comprising:
an initial data set giving module for obtaining a training data set D containing a plurality of categoriesmajorityAnd a few classes of training data sets DminorityGiven initial data set D;
an under-sampling module for eliminating the data samples of multiple categories in the initial data set D by random under-sampling method and generating a new data set of multiple categories and adopting Dmajority_reducedThe dataset represents the reduced subset;
an oversampling module for increasing the data samples of the minority class in the initial data set D by a random oversampling method and generating a new data set of the minority class, and adopting Dminority_increasedThe dataset represents the augmented subset;
model blending Module, blending Dmajority_reducedData set and Dmajority_reducedAnd combining the data sets to generate a new mixed data set D ', and training the integrated model of the mixed data set D' through 12 classifiers to obtain the classification result of the initial data set.
5. The hybrid frame-based imbalance classification system of claim 4, wherein the undersampling module comprises a sample reduction unit to:
selecting samples from a plurality of data sets through random undersampling, and determining the proportion of sample category selection through a preset category distribution threshold;
according to the reduction of the number of most of the data sets, the relatively fast data classification processing process is realized by using less memory;
and analyzing the influence of the unbalanced network anomaly detection data set on the integrated classification performance through random undersampling in different proportions.
6. The hybrid frame-based imbalance classification system of claim 4, wherein the oversampling module includes a sample addition unit to:
random copying is performed on the data samples of the minority class by a random oversampling method, so that the number of the samples of the minority class is increased;
by randomly controlling the sampling ratio, quantitative differences between the data samples of the minority class are balanced.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 3 are implemented when the computer program is executed by the processor.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 3.
CN202110708211.0A 2021-06-24 2021-06-24 Unbalanced classification method, system, equipment and storage medium based on mixed framework Active CN113378963B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110708211.0A CN113378963B (en) 2021-06-24 2021-06-24 Unbalanced classification method, system, equipment and storage medium based on mixed framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110708211.0A CN113378963B (en) 2021-06-24 2021-06-24 Unbalanced classification method, system, equipment and storage medium based on mixed framework

Publications (2)

Publication Number Publication Date
CN113378963A true CN113378963A (en) 2021-09-10
CN113378963B CN113378963B (en) 2023-10-13

Family

ID=77579075

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110708211.0A Active CN113378963B (en) 2021-06-24 2021-06-24 Unbalanced classification method, system, equipment and storage medium based on mixed framework

Country Status (1)

Country Link
CN (1) CN113378963B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871901A (en) * 2019-03-07 2019-06-11 中南大学 A kind of unbalanced data classification method based on mixing sampling and machine learning
US20200136890A1 (en) * 2018-10-24 2020-04-30 Affirmed Networks, Inc. Anomaly detection and classification in networked systems
CN112541536A (en) * 2020-12-09 2021-03-23 长沙理工大学 Under-sampling classification integration method, device and storage medium for credit scoring

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200136890A1 (en) * 2018-10-24 2020-04-30 Affirmed Networks, Inc. Anomaly detection and classification in networked systems
CN109871901A (en) * 2019-03-07 2019-06-11 中南大学 A kind of unbalanced data classification method based on mixing sampling and machine learning
CN112541536A (en) * 2020-12-09 2021-03-23 长沙理工大学 Under-sampling classification integration method, device and storage medium for credit scoring

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王波;王怀彬;: "基于主动学习的非均衡异常数据分类算法研究", 信息网络安全, no. 10 *

Also Published As

Publication number Publication date
CN113378963B (en) 2023-10-13

Similar Documents

Publication Publication Date Title
US10397231B2 (en) Differentiated containerization and execution of web content based on trust level and other attributes
Jian et al. A novel framework for image-based malware detection with a deep neural network
US9465734B1 (en) Coalition based memory management
US20160092789A1 (en) Category Oversampling for Imbalanced Machine Learning
CN102402479B (en) For the intermediate representation structure of static analysis
US10635794B2 (en) Determine security access level based on user behavior
CN111598494A (en) Resource limit adjusting method and device and electronic equipment
Schlagkamp et al. Understanding user behavior: from HPC to HTC
US20150331787A1 (en) Software verification
CN111522736A (en) Software defect prediction method and device, electronic equipment and computer storage medium
US20090210195A1 (en) Simulated bucket testing
US11750471B2 (en) Method and apparatus for determining resource configuration of cloud service system
CN113378963B (en) Unbalanced classification method, system, equipment and storage medium based on mixed framework
US20210311799A1 (en) Workload allocation among hardware devices
CN117113350A (en) Path self-adaption-based malicious software detection method, system and equipment
CN116245630A (en) Anti-fraud detection method and device, electronic equipment and medium
US20220405358A1 (en) Enhancing verification in mobile devices using model based on user interaction history
US20220308932A1 (en) Graphics processing management system
Groesser et al. Partial order reduction for Markov decision processes: A survey
CN104142885A (en) Method and device for carrying out abnormality test on tested program
CN114565105A (en) Data processing method and deep learning model training method and device
Kwan Markov image with transfer learning for malware detection and classification
US8156501B2 (en) Implementing dynamic authority to perform tasks on a resource
CN104142675A (en) Controllability checking systems and methods
Sabbagh et al. Gpu overdrive fault attacks on neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant