CN105787046A - Imbalanced data sorting system based on unilateral dynamic downsampling - Google Patents
Imbalanced data sorting system based on unilateral dynamic downsampling Download PDFInfo
- Publication number
- CN105787046A CN105787046A CN201610108097.7A CN201610108097A CN105787046A CN 105787046 A CN105787046 A CN 105787046A CN 201610108097 A CN201610108097 A CN 201610108097A CN 105787046 A CN105787046 A CN 105787046A
- Authority
- CN
- China
- Prior art keywords
- samples
- network
- iteration
- gradient
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 claims abstract description 40
- 238000000034 method Methods 0.000 claims abstract description 26
- 238000011478 gradient descent method Methods 0.000 claims abstract description 18
- 210000002569 neuron Anatomy 0.000 claims abstract description 17
- 230000006870 function Effects 0.000 claims description 7
- 238000013145 classification model Methods 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 2
- 238000012216 screening Methods 0.000 claims 1
- 230000008569 process Effects 0.000 abstract description 5
- 238000005516 engineering process Methods 0.000 abstract description 3
- 239000000523 sample Substances 0.000 description 26
- 238000004422 calculation algorithm Methods 0.000 description 23
- 238000005070 sampling Methods 0.000 description 19
- 238000013528 artificial neural network Methods 0.000 description 14
- 230000006399 behavior Effects 0.000 description 9
- 238000002790 cross-validation Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 238000003672 processing method Methods 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 238000007418 data mining Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 241000364051 Pima Species 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000007635 classification algorithm Methods 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000004836 empirical method Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000012952 Resampling Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011369 optimal treatment Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/086—Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physiology (AREA)
- Computational Linguistics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention provides an imbalanced data sorting system based on unilateral dynamic downsampling.Firstly, the system determines the structure of an adopted network according to the scale of imbalanced data and randomly initializes various layers of network nerve cell weights; secondly, the system adopts a gradient descent method to optimize a network model and sets the learning rate, charge factors and maximum iteration number of the gradient descent method; in the first iteration, all samples are used for calculating the total gradient, then the total gradient is used or updating the layers of weights, and training samples for the next iteration are selected according to distinguishing distances of the samples; the training samples selected for the previous iteration are repeatedly used for calculating the total gradient, the layers of weights are updated, and samples for the next iteration are selected till the maximum iteration number is reached; finally, an obtained sorting model is used for sorting unknown samples.Compared with a traditional sorting technology, the imbalanced data sorting system based on the unilateral dynamic downsampling achieves dynamic downsampling of training samples and avoids dataset information loss by combining the downsampling process with the training process of a sorter together and can effectively process the sorting problem of imbalanced data.
Description
Technical Field
The invention relates to the field of pattern recognition, in particular to an unbalanced data classification method and system based on unilateral dynamic downsampling.
Background
At present, in the era of data explosion, the data volume has been increased from the TB level to the PB level or even the EB level, and it is very important how to perform data mining on massive data to obtain useful information from the massive data. There are many research directions for data mining, and the classification problem is one of the important research branches. Classification (Classification) refers to selecting a training set which is classified from data, analyzing and learning the training set by using a Classification technology, finding out rules hidden in the data, and establishing a Classification model, thereby performing Classification prediction on unknown test samples. At present, a plurality of mature algorithms such as a K neighbor algorithm, a decision tree algorithm, an artificial neural network algorithm, a Bayesian algorithm, a support vector machine algorithm and the like exist for the traditional classification problem, and the algorithms are applied to a plurality of fields of data mining and obtain good classification effect.
Although the traditional classification algorithms can achieve better classification effect, they are mostly established on the premise of data set distribution equalization, that is, the number of various types of samples in the data set is generally consistent. However, in the application field of each subject, unbalanced data sets are more common. For the two types of problems, the number of samples in one category is generally much larger than that in the other category, wherein the category with less samples is called Positive category (Positive) and the category with more samples is called Negative category (Negative). For example, in financial fraud detection, generally speaking, most customers' transaction behaviors are normal, only very individual customers may be potential fraud behaviors, and there may be one fraud behavior in 10 ten thousand transactions; in addition, in the fields of medical diagnosis, network intrusion detection, anti-spam, oil exploration and the like, the problem of unbalanced data sets also exists. In these areas, some data imbalance problems are inherent because the probability of a positive type sample occurring itself is low. In part, the positive type samples need to be subjected to experimental verification, and the negative type samples do not need to be subjected to experimental verification, so that the cost for obtaining the negative type samples is low, and the cost for obtaining the positive type samples is high, so that the situation that the negative type is far more than the positive type in the data set occurs.
Because the traditional classification algorithm always takes the maximum total average classification precision of a classification model as a training target, and does not consider the relative distribution condition of each class, when the traditional classifier is used for solving the problem of unbalanced data classification, the performance of the classifier is often greatly reduced, the obtained classifier is prone to be in a negative class, and samples which belong to the positive class are often wrongly classified into the negative class. Such classifiers work poorly on the positive class. Practical problems often require that the detection rate of positive classes be high enough because positive classes are generally much more important than negative classes. Also a financial fraud detection problem, conventional classifiers can easily classify fraudulent behavior as normal behavior, but the cost of the loss to the bank is often much higher when the fraudulent behavior is considered as normal behavior than when the normal behavior is mistaken as fraudulent behavior. In medical diagnosis, if the patient is misdiagnosed as a normal person, the optimal treatment time is delayed, and the loss is beyond the estimation. Therefore, the correct classification of unbalanced data is an urgent problem to be solved. Constructing a classification system that can effectively deal with the imbalance problem will bring great economic benefits to industrial production and social economy.
Currently, in terms of processing unbalanced data, there are some processing methods based on data planes, such as random under-sampling (random-sampling), single-side sample selection (One-side-sampling), random over-sampling (random-sampling), and the like. However, these processing methods are independent of the training algorithm itself, i.e., there is independence between the training algorithm and the data processing methods. That is, the data set processed by the data processing method can be used by a plurality of different training algorithms. But the processed samples remain unchanged during the algorithm training. For the down-sampling method, the removed samples will not be used for training any more in the classifier training phase, which results in the loss of sample information, thereby affecting the classifier performance. In order to overcome the defect of the down-sampling method, an unbalanced data classification system based on single-side down-sampling is provided. During the training phase, the system can take all samples into account. In each iteration, the system dynamically downsamples the negative class samples to obtain balanced training samples.
Disclosure of Invention
Aiming at the problems that the existing classification technology based on data down-sampling can not combine the down-sampling with classifier training when processing unbalanced data and can not avoid the loss of sample data information after the down-sampling, the invention provides a method based on single-side dynamic down-sampling, which adopts the discrimination distance (discriminationdistance) of a sample to realize the down-sampling of a negative sample, adopts a non-feedback neural network to train a classification model and adopts a gradient descent method to optimize an algorithm model. The single-side dynamic downsampling is combined with the non-feedback neural network, so that the unbalanced data classification system based on the single-side dynamic downsampling is provided. The system can effectively deal with the classification problem of unbalanced data.
The technical scheme adopted by the invention for solving the technical problems is as follows: firstly, the system determines the structure of the adopted network according to the scale of unbalanced data, and randomly initializes the weight of each layer of network neurons; secondly, optimizing a network model by adopting a gradient descent method, and setting a learning rate, a charging factor and a maximum iteration number of the gradient descent method; in the first iteration, calculating a total gradient by using all samples, updating weights of all layers by using the total gradient, and selecting the samples as training samples of the next iteration according to the distinguishing distance of the samples; repeatedly using the training samples selected in the previous round to calculate the total gradient, updating the weight of each layer, and selecting the samples of the next round of iteration until the maximum iteration times are reached and then stopping; finally, the obtained classification model is used for classifying the unknown samples.
The technical scheme adopted by the invention for solving the technical problem can be further perfected. The method for determining the neural network structure is manually determined according to prior information of specific data, and a more appropriate network structure, such as the number of network layers, the number of neurons in each hidden layer, the type of a node activation function and the like, can be determined manually by adopting an empirical method. The gradient descent method is to minimize the objective function of the network by using the negative gradient according to the gradient direction obtained by the objective function of the neural network, so that the network has better classification performance. The single-side descent method adopts the discrimination distance to dynamically select the negative samples, and the quantity of the positive and negative samples can be effectively balanced.
The invention has the beneficial effects that: the negative samples are sampled by using the discrimination distance of the samples, so that the balance of the training samples on the number of the samples is realized; by combining a single-side down-sampling method with a non-feedback neural network, an unbalanced data classification system based on single-side down-sampling is provided, so that the negative samples can be dynamically down-sampled, and the down-sampling process and the classifier training process are combined together; training a non-feedback network model by adopting a gradient descent method and resampling a sample after each iteration step to realize dynamic down-sampling of a training sample; by combining sample downsampling with model training, the algorithm can effectively solve the problem of classification of unbalanced data.
Drawings
FIG. 1 is a system framework for an unbalanced data classification system based on dynamic downsampling of the present invention.
Detailed Description
The invention will be further described with reference to the following figures and examples: the method of the invention is divided into three steps.
The first step is as follows: the network structure and network parameters are initialized.
The system determines the structure of the adopted network according to the scale of the unbalanced data, and randomly initializes the weight of each layer of network neurons. The initialization of the network structure comprises the number of nodes of each layer of the network and the type of an activation function adopted by a network order; the initialization of the network parameters comprises the initialization of the weight of each neuron and the determination of the training target of the training sample. The initialization of the network structure and network parameters includes the following steps.
1) Initializing a neural network structure: and determining the structure of the neural network according to the scale of the unbalanced data, including the dimension of the sample, the number of samples and the unbalanced rate, wherein the structure comprises the number of layers of the neural network and the number of neurons in each layer. The imbalance rate reflects the degree of imbalance of the data set, and the calculation formula is as follows:
when the data set imbalance ratio exceeds 1.5, the data set is said to be an unbalanced data set. Number of hidden nodesThe setting is done by a worker based on experience. Each weight of the neural networkAre randomly initialized to between-1 and 1. For certain problems, the network structure may be determined by manual empirical methods. The activation function of the network point adopts a Sigmoid function:
。
2) setting network model training parameters: learning rate of gradient descent methodSet to 0.1, charge factorMaximum number of iterations. Iterative indexingInitializing a training sample setAll training samples.
The second step is that: and (5) optimizing the network model.
The system adopts a gradient descent method to optimize a network model, and sets a gradient descent method learning rate, a charging factor and a maximum iteration number; in the first iteration, calculating a total gradient by using all samples, updating weights of all layers by using the total gradient, and selecting the samples as training samples of the next iteration according to the distinguishing distance of the samples; and repeatedly using the training samples selected in the previous round to calculate the total gradient, updating the weight values of each layer, and selecting the samples of the next round of iteration until the maximum iteration times are reached and then stopping. The network model optimization includes the following steps.
1) Calculating the sum of squares of errors for a network:
Wherein,is the total number of samples and is,are respectively a sampleThe training target and the actual output value of the network. Sample(s)To judge the distanceComprises the following steps:
。
2) computing the gradient of the network over the sample set SWhereinThe weights are the weights connecting the ith neuron of the input layer and the jth neuron of the hidden layer. To obtainWe can get by the face is the integration rule:
wherein,is the output value of the kth neuron of the upper layer. For the hidden layer to the output layer,
wherein,is the output value of the jth neuron of the upper layer network. Therefore, the weight updating rule from the hidden layer to the output layer is as follows:
since we use no feedbackAnd the neural network finally updates the weights between the hidden layer and the output layer only by a gradient descent method, and does not need to update the weights between the input layer and the hidden layer. Thus, the overall gradient of the network over the sample set SComprises the following steps:
wherein,is a sampleThe corresponding gradient value. In the first placeAfter the second iteration, the charge of the networkCalculated by the following formula:
wherein,andthe weights of the corresponding networks after the l-th and l-1 th iterations, respectively.
3) Updating the network weight: root of herbaceous plantAccording to the gradient obtained aboveAnd chargeNetwork weightUpdated after the l-th iteration to:
wherein,is the charge factor.
4) Reselecting a training sample set S: for the entire sample set,Is a sampleThe class number of (a) is,to representFor positive type samples and vice versaIs of negative classAnd (4) sampling. The discrimination distance of the sample is. The training samples are reselected according to the following steps:
For
If
will be provided withAdded to the training sample set S
Else
If
Will be provided withAdded to the training sample set S
End
End
End
Wherein the discrimination distance of the sampleCalculated in step 4.
5) If the number of iterationsSkipping to the step 3 to continue training the network model; otherwise, step 8 is executed.
The third step: and carrying out classification prediction on the unknown samples.
After optimizing the network model in the second step, the system can classify the unknown samples. Network weightWhereinRepresenting the weight between the input layer and the hidden layer;the weight value between the hidden layer and the output layer;representing the offsets of the hidden layer neurons and the output layer neurons, respectively. Network hidden layer outputComprises the following steps:
the network output layer outputsComprises the following steps:
inputting samplesPrediction category ofThe following formula can be given:
。
hereinbefore, specific embodiments of the present invention are described with reference to the drawings. It will be understood by those skilled in the art that various changes and substitutions may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention. Such modifications and substitutions are intended to be included within the scope of the present invention as defined by the appended claims.
Results of the experiment
To verify the effectiveness of our proposed method, we performed experiments with three algorithms. The first contrast method is an original non-feedback neural network algorithm (marked as a contrast method one), the second contrast method is a non-feedback neural network method adopting random down-sampling (marked as a contrast method two), and the third contrast method is a non-feedback neural network method adopting single-side sample selection (marked as a contrast method three). We selected four imbalance data from the key imbalance database [ http:// sci2s. ugr. es/key/imbalanced. php ], including Pima, Ecoli3, Pageblocks13Vs4, and Yeast2Vs 8. The information for these unbalanced data sets is shown in table 1. For each data set, the parameters of the comparison algorithm are set as follows.
1) Data set Pima: the comparison algorithms involved use the same network structure and parameters. The network structure is as follows: 5 nodes of the input layer, 20 nodes of the hidden layer and 1 node of the output layer are marked as [8-20-1 ]]. The parameters of the network are: iterative step size of gradient descent methodCharge factorMaximum number of iterations. All experimental results are mean values from 5 rounds of cross validation.
2) Data set Ecoli 3: the comparison algorithms involved use the same network structure and parameters. The network structure is [7-40-1 ]]. The parameters of the network are: iterative step size of gradient descent methodCharge factorMaximum number of iterations. All experimental results are mean values from 5 rounds of cross validation.
3) Data set Pageblocks13Vs 4: the comparison algorithms involved use the same network structure and parameters. The network structure is [10-35-1 ]]. The parameters of the network are: iterative step size of gradient descent methodCharge factorMaximum number of iterations. All experimental results are mean values from 5 rounds of cross validation.
4) Data set Yeast2Vs 8: the comparison algorithms involved use the same network structure and parameters. The network structure is [8-40-1 ]]. The parameters of the network are: iterative step size of gradient descent methodCharge factorMaximum number of iterations. All experimental results are mean values from 5 rounds of cross validation;
we used AUC to evaluate the performance of the algorithm in unbalanced datasets. The AUC is calculated as follows:
wherein,indicating the proportion of pairs in the positive type samples,indicating the proportion of errors in the negative class samples.Andthe calculation formula is as follows:
wherein,representing the number of paired samples in the positive class;representing the number of backup error samples in the negative class;andrespectively representing the number of positive and negative class samples.
The results of the experiment are shown in table 2. From the experimental results, the performance of the method in all data sets is the best in the comparison algorithm, so that the advantages of the method in the aspect of processing the imbalance problem are verified, and the effectiveness of the method is reflected.
Table 1: unbalanced data set information
Table 2: AUC value (%) -of comparison algorithm in unbalanced data set
Claims (6)
1. An unbalanced data classification system based on unilateral dynamic downsampling is characterized in that: the method comprises the following specific steps:
1) determining the structure of the adopted network by the system according to the scale of the unbalanced data, and randomly initializing the weight of each layer of network neurons;
2) the system adopts a gradient descent method to optimize a network model, and sets a gradient descent method learning rate, a charge factor and a maximum iteration number; in the first iteration, calculating a total gradient by using all samples, updating weights of all layers by using the total gradient, and selecting the samples as training samples of the next iteration according to the distinguishing distance of the samples; repeatedly using the training samples selected in the previous round to calculate the total gradient, updating the weight of each layer, and selecting the samples of the next round of iteration until the maximum iteration times are reached and then stopping;
3) and classifying the unknown sample by using the obtained classification model.
2. The system according to claim 1, wherein said system further comprises: the unbalanced data set size comprises the number of samples of the unbalanced data set, the unbalanced rate of the data set and the dimensionality of the samples of the data set; the unbalanced data set refers to a data set with an unbalanced rate higher than 1.5, wherein the unbalanced rate refers to the number of negative samples compared with the number of positive samples; the network mechanism comprises a network layer number and a neuron node number of each layer; the neuron weight refers to the weight of the interconnection of neuron nodes of each layer.
3. The system according to claim 1, wherein said system further comprises: the iterative optimization of the network model by adopting a gradient descent method refers to the negative gradient of a network objective function; and then, updating the neuron node weight values of each layer according to the obtained negative gradient.
4. The system according to claim 1, wherein said system further comprises: the single side means that the negative samples in the training set are down sampled according to the distinguishing distance of the samples; the judgment distance refers to the distance between the network output value of the sample and the theoretical judgment value.
5. The system according to claim 1, wherein said system further comprises: and the dynamic sample selection comprises the steps of screening the negative samples according to the discrimination distance of the samples after each step of iteration, and adding all the positive samples into a new training set.
6. The system according to claim 1, wherein said system further comprises: the classification and prediction of the unknown samples comprises the steps of solving the network output of the unknown samples according to the obtained network weight, and comparing the network output with a theoretical discrimination value to determine the classification of the unknown samples.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610108097.7A CN105787046A (en) | 2016-02-28 | 2016-02-28 | Imbalanced data sorting system based on unilateral dynamic downsampling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610108097.7A CN105787046A (en) | 2016-02-28 | 2016-02-28 | Imbalanced data sorting system based on unilateral dynamic downsampling |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105787046A true CN105787046A (en) | 2016-07-20 |
Family
ID=56403025
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610108097.7A Pending CN105787046A (en) | 2016-02-28 | 2016-02-28 | Imbalanced data sorting system based on unilateral dynamic downsampling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105787046A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107578061A (en) * | 2017-08-16 | 2018-01-12 | 哈尔滨工业大学深圳研究生院 | Based on the imbalanced data classification issue method for minimizing loss study |
WO2018107906A1 (en) * | 2016-12-12 | 2018-06-21 | 腾讯科技(深圳)有限公司 | Classification model training method, and data classification method and device |
CN108460029A (en) * | 2018-04-12 | 2018-08-28 | 苏州大学 | Data reduction method towards neural machine translation |
CN110210570A (en) * | 2019-06-10 | 2019-09-06 | 上海延华大数据科技有限公司 | The more classification methods of diabetic retinopathy image based on deep learning |
CN110717529A (en) * | 2019-09-25 | 2020-01-21 | 南京旷云科技有限公司 | Data sampling method and device |
CN110807515A (en) * | 2019-10-30 | 2020-02-18 | 北京百度网讯科技有限公司 | Model generation method and device |
CN111666997A (en) * | 2020-06-01 | 2020-09-15 | 安徽紫薇帝星数字科技有限公司 | Sample balancing method and target organ segmentation model construction method |
CN113537511A (en) * | 2021-07-14 | 2021-10-22 | 中国科学技术大学 | Automatic gradient quantization federal learning framework and method |
-
2016
- 2016-02-28 CN CN201610108097.7A patent/CN105787046A/en active Pending
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11386353B2 (en) | 2016-12-12 | 2022-07-12 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for training classification model, and method and apparatus for classifying data |
WO2018107906A1 (en) * | 2016-12-12 | 2018-06-21 | 腾讯科技(深圳)有限公司 | Classification model training method, and data classification method and device |
WO2019033636A1 (en) * | 2017-08-16 | 2019-02-21 | 哈尔滨工业大学深圳研究生院 | Method of using minimized-loss learning to classify imbalanced samples |
CN107578061A (en) * | 2017-08-16 | 2018-01-12 | 哈尔滨工业大学深圳研究生院 | Based on the imbalanced data classification issue method for minimizing loss study |
CN108460029A (en) * | 2018-04-12 | 2018-08-28 | 苏州大学 | Data reduction method towards neural machine translation |
CN110210570A (en) * | 2019-06-10 | 2019-09-06 | 上海延华大数据科技有限公司 | The more classification methods of diabetic retinopathy image based on deep learning |
CN110717529A (en) * | 2019-09-25 | 2020-01-21 | 南京旷云科技有限公司 | Data sampling method and device |
CN110717529B (en) * | 2019-09-25 | 2022-09-30 | 南京旷云科技有限公司 | Data sampling method and device |
CN110807515A (en) * | 2019-10-30 | 2020-02-18 | 北京百度网讯科技有限公司 | Model generation method and device |
CN110807515B (en) * | 2019-10-30 | 2023-04-28 | 北京百度网讯科技有限公司 | Model generation method and device |
CN111666997A (en) * | 2020-06-01 | 2020-09-15 | 安徽紫薇帝星数字科技有限公司 | Sample balancing method and target organ segmentation model construction method |
CN111666997B (en) * | 2020-06-01 | 2023-10-27 | 安徽紫薇帝星数字科技有限公司 | Sample balancing method and target organ segmentation model construction method |
CN113537511A (en) * | 2021-07-14 | 2021-10-22 | 中国科学技术大学 | Automatic gradient quantization federal learning framework and method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105787046A (en) | Imbalanced data sorting system based on unilateral dynamic downsampling | |
Xiao et al. | Cost-sensitive semi-supervised selective ensemble model for customer credit scoring | |
CN109034194B (en) | Transaction fraud behavior deep detection method based on feature differentiation | |
CN110276679B (en) | Network personal credit fraud behavior detection method for deep learning | |
CN110084610B (en) | Network transaction fraud detection system based on twin neural network | |
WO2017140222A1 (en) | Modelling method and device for machine learning model | |
CN112037012A (en) | Internet financial credit evaluation method based on PSO-BP neural network | |
CN112633337A (en) | Unbalanced data processing method based on clustering and boundary points | |
Baek et al. | Bankruptcy prediction for credit risk using an auto-associative neural network in Korean firms | |
CN109886284B (en) | Fraud detection method and system based on hierarchical clustering | |
CN107832789B (en) | Feature weighting K nearest neighbor fault diagnosis method based on average influence value data transformation | |
Jamalian et al. | A hybrid data mining method for customer churn prediction | |
CN113269647A (en) | Graph-based transaction abnormity associated user detection method | |
CN111062806A (en) | Personal finance credit risk evaluation method, system and storage medium | |
Owusu et al. | A deep learning approach for loan default prediction using imbalanced dataset | |
Al Doori et al. | Credit scoring model based on back propagation neural network using various activation and error function | |
CN109934286A (en) | Bug based on Text character extraction and uneven processing strategie reports severity recognition methods | |
Fakiha | Forensic Credit Card Fraud Detection Using Deep Neural Network | |
CN116934470A (en) | Financial transaction risk assessment method based on clustering sampling and meta integration | |
Chowdhury et al. | Bankruptcy prediction for imbalanced dataset using oversampling and ensemble machine learning methods | |
CN115496364A (en) | Method and device for identifying heterogeneous enterprises, storage medium and electronic equipment | |
Khedr et al. | An ensemble model for financial statement fraud detection | |
Pristyanto et al. | Comparison of ensemble models as solutions for imbalanced class classification of datasets | |
Yang et al. | Credit card fraud detection based on CSat-related AdaBoost | |
Caplescu et al. | Will they repay their debt? Identification of borrowers likely to be charged off |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160720 |