CN105787046A - Imbalanced data sorting system based on unilateral dynamic downsampling - Google Patents

Imbalanced data sorting system based on unilateral dynamic downsampling Download PDF

Info

Publication number
CN105787046A
CN105787046A CN201610108097.7A CN201610108097A CN105787046A CN 105787046 A CN105787046 A CN 105787046A CN 201610108097 A CN201610108097 A CN 201610108097A CN 105787046 A CN105787046 A CN 105787046A
Authority
CN
China
Prior art keywords
samples
network
iteration
gradient
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610108097.7A
Other languages
Chinese (zh)
Inventor
王喆
李冬冬
范奇
高大启
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China University of Science and Technology
Original Assignee
East China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China University of Science and Technology filed Critical East China University of Science and Technology
Priority to CN201610108097.7A priority Critical patent/CN105787046A/en
Publication of CN105787046A publication Critical patent/CN105787046A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/086Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physiology (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an imbalanced data sorting system based on unilateral dynamic downsampling.Firstly, the system determines the structure of an adopted network according to the scale of imbalanced data and randomly initializes various layers of network nerve cell weights; secondly, the system adopts a gradient descent method to optimize a network model and sets the learning rate, charge factors and maximum iteration number of the gradient descent method; in the first iteration, all samples are used for calculating the total gradient, then the total gradient is used or updating the layers of weights, and training samples for the next iteration are selected according to distinguishing distances of the samples; the training samples selected for the previous iteration are repeatedly used for calculating the total gradient, the layers of weights are updated, and samples for the next iteration are selected till the maximum iteration number is reached; finally, an obtained sorting model is used for sorting unknown samples.Compared with a traditional sorting technology, the imbalanced data sorting system based on the unilateral dynamic downsampling achieves dynamic downsampling of training samples and avoids dataset information loss by combining the downsampling process with the training process of a sorter together and can effectively process the sorting problem of imbalanced data.

Description

Unbalanced data classification system based on unilateral dynamic downsampling
Technical Field
The invention relates to the field of pattern recognition, in particular to an unbalanced data classification method and system based on unilateral dynamic downsampling.
Background
At present, in the era of data explosion, the data volume has been increased from the TB level to the PB level or even the EB level, and it is very important how to perform data mining on massive data to obtain useful information from the massive data. There are many research directions for data mining, and the classification problem is one of the important research branches. Classification (Classification) refers to selecting a training set which is classified from data, analyzing and learning the training set by using a Classification technology, finding out rules hidden in the data, and establishing a Classification model, thereby performing Classification prediction on unknown test samples. At present, a plurality of mature algorithms such as a K neighbor algorithm, a decision tree algorithm, an artificial neural network algorithm, a Bayesian algorithm, a support vector machine algorithm and the like exist for the traditional classification problem, and the algorithms are applied to a plurality of fields of data mining and obtain good classification effect.
Although the traditional classification algorithms can achieve better classification effect, they are mostly established on the premise of data set distribution equalization, that is, the number of various types of samples in the data set is generally consistent. However, in the application field of each subject, unbalanced data sets are more common. For the two types of problems, the number of samples in one category is generally much larger than that in the other category, wherein the category with less samples is called Positive category (Positive) and the category with more samples is called Negative category (Negative). For example, in financial fraud detection, generally speaking, most customers' transaction behaviors are normal, only very individual customers may be potential fraud behaviors, and there may be one fraud behavior in 10 ten thousand transactions; in addition, in the fields of medical diagnosis, network intrusion detection, anti-spam, oil exploration and the like, the problem of unbalanced data sets also exists. In these areas, some data imbalance problems are inherent because the probability of a positive type sample occurring itself is low. In part, the positive type samples need to be subjected to experimental verification, and the negative type samples do not need to be subjected to experimental verification, so that the cost for obtaining the negative type samples is low, and the cost for obtaining the positive type samples is high, so that the situation that the negative type is far more than the positive type in the data set occurs.
Because the traditional classification algorithm always takes the maximum total average classification precision of a classification model as a training target, and does not consider the relative distribution condition of each class, when the traditional classifier is used for solving the problem of unbalanced data classification, the performance of the classifier is often greatly reduced, the obtained classifier is prone to be in a negative class, and samples which belong to the positive class are often wrongly classified into the negative class. Such classifiers work poorly on the positive class. Practical problems often require that the detection rate of positive classes be high enough because positive classes are generally much more important than negative classes. Also a financial fraud detection problem, conventional classifiers can easily classify fraudulent behavior as normal behavior, but the cost of the loss to the bank is often much higher when the fraudulent behavior is considered as normal behavior than when the normal behavior is mistaken as fraudulent behavior. In medical diagnosis, if the patient is misdiagnosed as a normal person, the optimal treatment time is delayed, and the loss is beyond the estimation. Therefore, the correct classification of unbalanced data is an urgent problem to be solved. Constructing a classification system that can effectively deal with the imbalance problem will bring great economic benefits to industrial production and social economy.
Currently, in terms of processing unbalanced data, there are some processing methods based on data planes, such as random under-sampling (random-sampling), single-side sample selection (One-side-sampling), random over-sampling (random-sampling), and the like. However, these processing methods are independent of the training algorithm itself, i.e., there is independence between the training algorithm and the data processing methods. That is, the data set processed by the data processing method can be used by a plurality of different training algorithms. But the processed samples remain unchanged during the algorithm training. For the down-sampling method, the removed samples will not be used for training any more in the classifier training phase, which results in the loss of sample information, thereby affecting the classifier performance. In order to overcome the defect of the down-sampling method, an unbalanced data classification system based on single-side down-sampling is provided. During the training phase, the system can take all samples into account. In each iteration, the system dynamically downsamples the negative class samples to obtain balanced training samples.
Disclosure of Invention
Aiming at the problems that the existing classification technology based on data down-sampling can not combine the down-sampling with classifier training when processing unbalanced data and can not avoid the loss of sample data information after the down-sampling, the invention provides a method based on single-side dynamic down-sampling, which adopts the discrimination distance (discriminationdistance) of a sample to realize the down-sampling of a negative sample, adopts a non-feedback neural network to train a classification model and adopts a gradient descent method to optimize an algorithm model. The single-side dynamic downsampling is combined with the non-feedback neural network, so that the unbalanced data classification system based on the single-side dynamic downsampling is provided. The system can effectively deal with the classification problem of unbalanced data.
The technical scheme adopted by the invention for solving the technical problems is as follows: firstly, the system determines the structure of the adopted network according to the scale of unbalanced data, and randomly initializes the weight of each layer of network neurons; secondly, optimizing a network model by adopting a gradient descent method, and setting a learning rate, a charging factor and a maximum iteration number of the gradient descent method; in the first iteration, calculating a total gradient by using all samples, updating weights of all layers by using the total gradient, and selecting the samples as training samples of the next iteration according to the distinguishing distance of the samples; repeatedly using the training samples selected in the previous round to calculate the total gradient, updating the weight of each layer, and selecting the samples of the next round of iteration until the maximum iteration times are reached and then stopping; finally, the obtained classification model is used for classifying the unknown samples.
The technical scheme adopted by the invention for solving the technical problem can be further perfected. The method for determining the neural network structure is manually determined according to prior information of specific data, and a more appropriate network structure, such as the number of network layers, the number of neurons in each hidden layer, the type of a node activation function and the like, can be determined manually by adopting an empirical method. The gradient descent method is to minimize the objective function of the network by using the negative gradient according to the gradient direction obtained by the objective function of the neural network, so that the network has better classification performance. The single-side descent method adopts the discrimination distance to dynamically select the negative samples, and the quantity of the positive and negative samples can be effectively balanced.
The invention has the beneficial effects that: the negative samples are sampled by using the discrimination distance of the samples, so that the balance of the training samples on the number of the samples is realized; by combining a single-side down-sampling method with a non-feedback neural network, an unbalanced data classification system based on single-side down-sampling is provided, so that the negative samples can be dynamically down-sampled, and the down-sampling process and the classifier training process are combined together; training a non-feedback network model by adopting a gradient descent method and resampling a sample after each iteration step to realize dynamic down-sampling of a training sample; by combining sample downsampling with model training, the algorithm can effectively solve the problem of classification of unbalanced data.
Drawings
FIG. 1 is a system framework for an unbalanced data classification system based on dynamic downsampling of the present invention.
Detailed Description
The invention will be further described with reference to the following figures and examples: the method of the invention is divided into three steps.
The first step is as follows: the network structure and network parameters are initialized.
The system determines the structure of the adopted network according to the scale of the unbalanced data, and randomly initializes the weight of each layer of network neurons. The initialization of the network structure comprises the number of nodes of each layer of the network and the type of an activation function adopted by a network order; the initialization of the network parameters comprises the initialization of the weight of each neuron and the determination of the training target of the training sample. The initialization of the network structure and network parameters includes the following steps.
1) Initializing a neural network structure: and determining the structure of the neural network according to the scale of the unbalanced data, including the dimension of the sample, the number of samples and the unbalanced rate, wherein the structure comprises the number of layers of the neural network and the number of neurons in each layer. The imbalance rate reflects the degree of imbalance of the data set, and the calculation formula is as follows:
when the data set imbalance ratio exceeds 1.5, the data set is said to be an unbalanced data set. Number of hidden nodesThe setting is done by a worker based on experience. Each weight of the neural networkAre randomly initialized to between-1 and 1. For certain problems, the network structure may be determined by manual empirical methods. The activation function of the network point adopts a Sigmoid function:
2) setting network model training parameters: learning rate of gradient descent methodSet to 0.1, charge factorMaximum number of iterations. Iterative indexingInitializing a training sample setAll training samples.
The second step is that: and (5) optimizing the network model.
The system adopts a gradient descent method to optimize a network model, and sets a gradient descent method learning rate, a charging factor and a maximum iteration number; in the first iteration, calculating a total gradient by using all samples, updating weights of all layers by using the total gradient, and selecting the samples as training samples of the next iteration according to the distinguishing distance of the samples; and repeatedly using the training samples selected in the previous round to calculate the total gradient, updating the weight values of each layer, and selecting the samples of the next round of iteration until the maximum iteration times are reached and then stopping. The network model optimization includes the following steps.
1) Calculating the sum of squares of errors for a network
Wherein,is the total number of samples and is,are respectively a sampleThe training target and the actual output value of the network. Sample(s)To judge the distanceComprises the following steps:
2) computing the gradient of the network over the sample set SWhereinThe weights are the weights connecting the ith neuron of the input layer and the jth neuron of the hidden layer. To obtainWe can get by the face is the integration rule:
wherein,is the output value of the kth neuron of the upper layer. For the hidden layer to the output layer,
wherein,is the output value of the jth neuron of the upper layer network. Therefore, the weight updating rule from the hidden layer to the output layer is as follows:
since we use no feedbackAnd the neural network finally updates the weights between the hidden layer and the output layer only by a gradient descent method, and does not need to update the weights between the input layer and the hidden layer. Thus, the overall gradient of the network over the sample set SComprises the following steps:
wherein,is a sampleThe corresponding gradient value. In the first placeAfter the second iteration, the charge of the networkCalculated by the following formula:
wherein,andthe weights of the corresponding networks after the l-th and l-1 th iterations, respectively.
3) Updating the network weight: root of herbaceous plantAccording to the gradient obtained aboveAnd chargeNetwork weightUpdated after the l-th iteration to:
wherein,is the charge factor.
4) Reselecting a training sample set S: for the entire sample setIs a sampleThe class number of (a) is,to representFor positive type samples and vice versaIs of negative classAnd (4) sampling. The discrimination distance of the sample is. The training samples are reselected according to the following steps:
For
If
will be provided withAdded to the training sample set S
Else
If
Will be provided withAdded to the training sample set S
End
End
End
Wherein the discrimination distance of the sampleCalculated in step 4.
5) If the number of iterationsSkipping to the step 3 to continue training the network model; otherwise, step 8 is executed.
The third step: and carrying out classification prediction on the unknown samples.
After optimizing the network model in the second step, the system can classify the unknown samples. Network weightWhereinRepresenting the weight between the input layer and the hidden layer;the weight value between the hidden layer and the output layer;representing the offsets of the hidden layer neurons and the output layer neurons, respectively. Network hidden layer outputComprises the following steps:
the network output layer outputsComprises the following steps:
inputting samplesPrediction category ofThe following formula can be given:
hereinbefore, specific embodiments of the present invention are described with reference to the drawings. It will be understood by those skilled in the art that various changes and substitutions may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention. Such modifications and substitutions are intended to be included within the scope of the present invention as defined by the appended claims.
Results of the experiment
To verify the effectiveness of our proposed method, we performed experiments with three algorithms. The first contrast method is an original non-feedback neural network algorithm (marked as a contrast method one), the second contrast method is a non-feedback neural network method adopting random down-sampling (marked as a contrast method two), and the third contrast method is a non-feedback neural network method adopting single-side sample selection (marked as a contrast method three). We selected four imbalance data from the key imbalance database [ http:// sci2s. ugr. es/key/imbalanced. php ], including Pima, Ecoli3, Pageblocks13Vs4, and Yeast2Vs 8. The information for these unbalanced data sets is shown in table 1. For each data set, the parameters of the comparison algorithm are set as follows.
1) Data set Pima: the comparison algorithms involved use the same network structure and parameters. The network structure is as follows: 5 nodes of the input layer, 20 nodes of the hidden layer and 1 node of the output layer are marked as [8-20-1 ]]. The parameters of the network are: iterative step size of gradient descent methodCharge factorMaximum number of iterations. All experimental results are mean values from 5 rounds of cross validation.
2) Data set Ecoli 3: the comparison algorithms involved use the same network structure and parameters. The network structure is [7-40-1 ]]. The parameters of the network are: iterative step size of gradient descent methodCharge factorMaximum number of iterations. All experimental results are mean values from 5 rounds of cross validation.
3) Data set Pageblocks13Vs 4: the comparison algorithms involved use the same network structure and parameters. The network structure is [10-35-1 ]]. The parameters of the network are: iterative step size of gradient descent methodCharge factorMaximum number of iterations. All experimental results are mean values from 5 rounds of cross validation.
4) Data set Yeast2Vs 8: the comparison algorithms involved use the same network structure and parameters. The network structure is [8-40-1 ]]. The parameters of the network are: iterative step size of gradient descent methodCharge factorMaximum number of iterations. All experimental results are mean values from 5 rounds of cross validation;
we used AUC to evaluate the performance of the algorithm in unbalanced datasets. The AUC is calculated as follows:
wherein,indicating the proportion of pairs in the positive type samples,indicating the proportion of errors in the negative class samples.Andthe calculation formula is as follows:
wherein,representing the number of paired samples in the positive class;representing the number of backup error samples in the negative class;andrespectively representing the number of positive and negative class samples.
The results of the experiment are shown in table 2. From the experimental results, the performance of the method in all data sets is the best in the comparison algorithm, so that the advantages of the method in the aspect of processing the imbalance problem are verified, and the effectiveness of the method is reflected.
Table 1: unbalanced data set information
Table 2: AUC value (%) -of comparison algorithm in unbalanced data set

Claims (6)

1. An unbalanced data classification system based on unilateral dynamic downsampling is characterized in that: the method comprises the following specific steps:
1) determining the structure of the adopted network by the system according to the scale of the unbalanced data, and randomly initializing the weight of each layer of network neurons;
2) the system adopts a gradient descent method to optimize a network model, and sets a gradient descent method learning rate, a charge factor and a maximum iteration number; in the first iteration, calculating a total gradient by using all samples, updating weights of all layers by using the total gradient, and selecting the samples as training samples of the next iteration according to the distinguishing distance of the samples; repeatedly using the training samples selected in the previous round to calculate the total gradient, updating the weight of each layer, and selecting the samples of the next round of iteration until the maximum iteration times are reached and then stopping;
3) and classifying the unknown sample by using the obtained classification model.
2. The system according to claim 1, wherein said system further comprises: the unbalanced data set size comprises the number of samples of the unbalanced data set, the unbalanced rate of the data set and the dimensionality of the samples of the data set; the unbalanced data set refers to a data set with an unbalanced rate higher than 1.5, wherein the unbalanced rate refers to the number of negative samples compared with the number of positive samples; the network mechanism comprises a network layer number and a neuron node number of each layer; the neuron weight refers to the weight of the interconnection of neuron nodes of each layer.
3. The system according to claim 1, wherein said system further comprises: the iterative optimization of the network model by adopting a gradient descent method refers to the negative gradient of a network objective function; and then, updating the neuron node weight values of each layer according to the obtained negative gradient.
4. The system according to claim 1, wherein said system further comprises: the single side means that the negative samples in the training set are down sampled according to the distinguishing distance of the samples; the judgment distance refers to the distance between the network output value of the sample and the theoretical judgment value.
5. The system according to claim 1, wherein said system further comprises: and the dynamic sample selection comprises the steps of screening the negative samples according to the discrimination distance of the samples after each step of iteration, and adding all the positive samples into a new training set.
6. The system according to claim 1, wherein said system further comprises: the classification and prediction of the unknown samples comprises the steps of solving the network output of the unknown samples according to the obtained network weight, and comparing the network output with a theoretical discrimination value to determine the classification of the unknown samples.
CN201610108097.7A 2016-02-28 2016-02-28 Imbalanced data sorting system based on unilateral dynamic downsampling Pending CN105787046A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610108097.7A CN105787046A (en) 2016-02-28 2016-02-28 Imbalanced data sorting system based on unilateral dynamic downsampling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610108097.7A CN105787046A (en) 2016-02-28 2016-02-28 Imbalanced data sorting system based on unilateral dynamic downsampling

Publications (1)

Publication Number Publication Date
CN105787046A true CN105787046A (en) 2016-07-20

Family

ID=56403025

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610108097.7A Pending CN105787046A (en) 2016-02-28 2016-02-28 Imbalanced data sorting system based on unilateral dynamic downsampling

Country Status (1)

Country Link
CN (1) CN105787046A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107578061A (en) * 2017-08-16 2018-01-12 哈尔滨工业大学深圳研究生院 Based on the imbalanced data classification issue method for minimizing loss study
WO2018107906A1 (en) * 2016-12-12 2018-06-21 腾讯科技(深圳)有限公司 Classification model training method, and data classification method and device
CN108460029A (en) * 2018-04-12 2018-08-28 苏州大学 Data reduction method towards neural machine translation
CN110210570A (en) * 2019-06-10 2019-09-06 上海延华大数据科技有限公司 The more classification methods of diabetic retinopathy image based on deep learning
CN110717529A (en) * 2019-09-25 2020-01-21 南京旷云科技有限公司 Data sampling method and device
CN110807515A (en) * 2019-10-30 2020-02-18 北京百度网讯科技有限公司 Model generation method and device
CN111666997A (en) * 2020-06-01 2020-09-15 安徽紫薇帝星数字科技有限公司 Sample balancing method and target organ segmentation model construction method
CN113537511A (en) * 2021-07-14 2021-10-22 中国科学技术大学 Automatic gradient quantization federal learning framework and method

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11386353B2 (en) 2016-12-12 2022-07-12 Tencent Technology (Shenzhen) Company Limited Method and apparatus for training classification model, and method and apparatus for classifying data
WO2018107906A1 (en) * 2016-12-12 2018-06-21 腾讯科技(深圳)有限公司 Classification model training method, and data classification method and device
WO2019033636A1 (en) * 2017-08-16 2019-02-21 哈尔滨工业大学深圳研究生院 Method of using minimized-loss learning to classify imbalanced samples
CN107578061A (en) * 2017-08-16 2018-01-12 哈尔滨工业大学深圳研究生院 Based on the imbalanced data classification issue method for minimizing loss study
CN108460029A (en) * 2018-04-12 2018-08-28 苏州大学 Data reduction method towards neural machine translation
CN110210570A (en) * 2019-06-10 2019-09-06 上海延华大数据科技有限公司 The more classification methods of diabetic retinopathy image based on deep learning
CN110717529A (en) * 2019-09-25 2020-01-21 南京旷云科技有限公司 Data sampling method and device
CN110717529B (en) * 2019-09-25 2022-09-30 南京旷云科技有限公司 Data sampling method and device
CN110807515A (en) * 2019-10-30 2020-02-18 北京百度网讯科技有限公司 Model generation method and device
CN110807515B (en) * 2019-10-30 2023-04-28 北京百度网讯科技有限公司 Model generation method and device
CN111666997A (en) * 2020-06-01 2020-09-15 安徽紫薇帝星数字科技有限公司 Sample balancing method and target organ segmentation model construction method
CN111666997B (en) * 2020-06-01 2023-10-27 安徽紫薇帝星数字科技有限公司 Sample balancing method and target organ segmentation model construction method
CN113537511A (en) * 2021-07-14 2021-10-22 中国科学技术大学 Automatic gradient quantization federal learning framework and method

Similar Documents

Publication Publication Date Title
CN105787046A (en) Imbalanced data sorting system based on unilateral dynamic downsampling
Xiao et al. Cost-sensitive semi-supervised selective ensemble model for customer credit scoring
CN109034194B (en) Transaction fraud behavior deep detection method based on feature differentiation
CN110276679B (en) Network personal credit fraud behavior detection method for deep learning
CN110084610B (en) Network transaction fraud detection system based on twin neural network
WO2017140222A1 (en) Modelling method and device for machine learning model
CN112037012A (en) Internet financial credit evaluation method based on PSO-BP neural network
CN112633337A (en) Unbalanced data processing method based on clustering and boundary points
Baek et al. Bankruptcy prediction for credit risk using an auto-associative neural network in Korean firms
CN109886284B (en) Fraud detection method and system based on hierarchical clustering
CN107832789B (en) Feature weighting K nearest neighbor fault diagnosis method based on average influence value data transformation
Jamalian et al. A hybrid data mining method for customer churn prediction
CN113269647A (en) Graph-based transaction abnormity associated user detection method
CN111062806A (en) Personal finance credit risk evaluation method, system and storage medium
Owusu et al. A deep learning approach for loan default prediction using imbalanced dataset
Al Doori et al. Credit scoring model based on back propagation neural network using various activation and error function
CN109934286A (en) Bug based on Text character extraction and uneven processing strategie reports severity recognition methods
Fakiha Forensic Credit Card Fraud Detection Using Deep Neural Network
CN116934470A (en) Financial transaction risk assessment method based on clustering sampling and meta integration
Chowdhury et al. Bankruptcy prediction for imbalanced dataset using oversampling and ensemble machine learning methods
CN115496364A (en) Method and device for identifying heterogeneous enterprises, storage medium and electronic equipment
Khedr et al. An ensemble model for financial statement fraud detection
Pristyanto et al. Comparison of ensemble models as solutions for imbalanced class classification of datasets
Yang et al. Credit card fraud detection based on CSat-related AdaBoost
Caplescu et al. Will they repay their debt? Identification of borrowers likely to be charged off

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160720