CN103593470A - Double-degree integrated unbalanced data stream classification algorithm - Google Patents

Double-degree integrated unbalanced data stream classification algorithm Download PDF

Info

Publication number
CN103593470A
CN103593470A CN201310624425.5A CN201310624425A CN103593470A CN 103593470 A CN103593470 A CN 103593470A CN 201310624425 A CN201310624425 A CN 201310624425A CN 103593470 A CN103593470 A CN 103593470A
Authority
CN
China
Prior art keywords
data
classification model
lack
classification
balance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310624425.5A
Other languages
Chinese (zh)
Other versions
CN103593470B (en
Inventor
张重生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University
Original Assignee
Henan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University filed Critical Henan University
Priority to CN201310624425.5A priority Critical patent/CN103593470B/en
Publication of CN103593470A publication Critical patent/CN103593470A/en
Application granted granted Critical
Publication of CN103593470B publication Critical patent/CN103593470B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a double-degree integrated unbalanced data stream classification algorithm. The double-degree integrated unbalanced data stream classification algorithm includes a balanced data stream classification model prediction stage, a classification reliability evaluation stage and an unbalanced data stream classification model prediction stage. In the balanced data stream classification model prediction stage, firstly, a balanced data stream classification model predicts the classification of each data record. In the classification reliability evaluation stage, reliability evaluation is conducted on the classification results obtained in the balanced data stream classification model prediction stage, the classification results of the records with high reliability are directly sent back to a user, and the data records with low reliability need to be classified again in the unbalanced data stream classification model prediction stage. The method embodied in the double-degree integrated unbalanced data stream classification algorithm can be widely applied to applications such as computer-assisted clinical diagnosis and real-time intrusion detection, and the invention belongs to the field of artificial intelligence applications.

Description

The integrated unbalanced data flow classification algorithm of a kind of two degree
Technical field
The present invention relates to a kind of data flow classification algorithm, relate in particular to the integrated unbalanced data flow classification algorithm of a kind of two degree.
Background technology
In recent years, data mining technology is more and more in the practical application of all trades and professions, comprise area of computer aided clinical diagnosis, the commending system based on internet and ad system, client segmentation, finance data analysis and abnormal transaction monitoring etc., intellectual analysis and the decision system of this Industry-oriented are accepted extensively by people.
In a lot of practical applications, the distribution of data is unbalanced, claims again to distribute, and for example, 90% data recording belongs to classification A together, claims that A is most classes; And only have 10% data recording to belong to classification B, so claim that again B is minority class.For example, in the application of analyzing at finance data, most transaction are all normal, and it is abnormal only having only a few transaction; While using sorting technique to note abnormalities conclude the business regular, how from a small amount of abnormal transaction record, to note abnormalities transaction rule and set up abnormal classification of business transaction model, be the task of extremely having challenge: this disaggregated model needs to identify comparatively exactly abnormal transaction; Can not transaction be normally mistaken for abnormal simultaneously.In other words, this disaggregated model should be classified to abnormal transaction comparatively exactly, and normal transaction also needs to classify comparatively exactly.
The practical application of a lot of data minings not only needs to process static data, and need to process a large amount of flow datas, also be data stream, such as: social media excavate, the application such as flow analysis, stock exchange analysis, event detection, sensing data processing are clicked in website.In these application, the data stream of skewness weighing apparatus, the data stream tilting that also distributes is common.Although existing sorting algorithm can improve the classify accuracy of the minority class in the data stream of skewness weighing apparatus, reduced the classify accuracy of most classes.Therefore, need a kind of sorting algorithm of more desirable unbalanced data stream, this algorithm can be predicted the minority class data recording in unbalanced data stream comparatively exactly, can guarantee the classify accuracy to most class data recording again.
Summary of the invention
The object of this invention is to provide the integrated unbalanced data flow classification algorithm of a kind of two degree, can predict comparatively exactly the minority class in unbalanced data stream, can guarantee the classify accuracy to most class data recording again.
The present invention adopts following technical proposals:
Spend an integrated unbalanced data flow classification algorithm, comprise following step:
A: equalization data traffic classification model and lack of balance data flow classification model training stage: each up-to-date data stream record block of concentrating for training data, is divided into training set and checking collection; On training set, train respectively the disaggregated model of a balanced disaggregated model and a lack of balance; Be retained in the disaggregated model of n the equilibrium that the upper classify accuracy of checking collection is the highest and the disaggregated model of n lack of balance;
B: utilize n equalization data traffic classification model and n lack of balance data flow classification model in steps A classify and carry out reliability assessment verifying concentrated data recording, finally draw the confidence level threshold value δ of optimization;
C: use n equalization data traffic classification model and n lack of balance data flow classification model in steps A to classify for each concentrated data recording of test data, and export final classification results.
The method that in described step B, usage data drives is determined the confidence level threshold value δ optimizing on checking collection, and concrete grammar is as follows:
With the accuracy of m1 presentation class, the sensitivity of m2 presentation class and the geometric mean of specificity; Initializing variable d=1.0, t=0, on verification msg collection; Circulation is carried out as is finished drilling: since 0, the value of δ is increased to 0.02 at every turn, and verify the value of the point (m1, m2) that this δ value is corresponding and the distance l of point (1,1); If this l is also less than d, d=l, t=δ; This circular flow o'clock finishes to δ=1; After circulation finishes, the currency of t is assigned to δ, δ value is now the confidence level threshold value of optimization.
Every the data recording u in described step C, test data being concentrated classify and predicts and comprise following step:
C1: first integrated retained a n equalization data traffic classification model to the u prediction of classifying;
C2: calculate the confidence level r of the classification results of u (u), the classification results that confidence level r (u) is greater than the confidence level threshold value δ of optimization directly returns to user;
C3: if to the low r of the classifying believe degree of u (u) and the confidence level threshold value δ optimizing, the disaggregated model of an integrated n lack of balance carries out subseries again to u, and returns to final classification results.
In described steps A, train equalization data traffic classification model to comprise following step:
A11: training set is carried out to simple random sampling, and sample size, for being designated as s, is not distinguished the classification of sample during sampling, and this sample is designated as T1;
A12: use sorting algorithm, train classification models on T1, claims that this disaggregated model is 1 equalization data traffic classification model;
A13: test existing equalization data traffic classification model, if the sum of equalization data traffic classification model surpasses n, on checking collection, test has equalization data traffic classification model one by one, and the poorest equalization data traffic classification model of superseded classify accuracy, until the sum of residue equalization data traffic classification model equals n;
In described steps A, train 1 lack of balance data flow classification model to comprise following step:
A21: collect the minority class data recording in the training set of each data stream record block, and put into minority class and record container, if minority class records the sum of data recording in container, surpass defined amount s, eliminate the oldest data recording in this piece, until the sum of remaining data record equals s;
A22: during sampling, first Tr is carried out to simple random sampling, sample size is s/2, does not distinguish the classification of sample during sampling; Then data recording minority class being recorded in container is carried out simple random sampling, and sample size is also s/2, and twice data from the sample survey combined and form up-to-date data from the sample survey, is designated as T2;
A23: use sorting algorithm, train classification models on T2.Claim that this disaggregated model is 1 lack of balance data flow classification model;
A24: test existing lack of balance data flow classification model: if the sum of lack of balance data flow classification model surpasses n, on Va, test has lack of balance data flow classification model one by one, and the poorest lack of balance data flow classification model of superseded classify accuracy, until the sum of remaining lack of balance data flow classification model equals n.
The present invention is by using the model prediction of equalization data traffic classification, classifying believe degree assessment and unbalanced data flow classification model prediction three phases, comparatively exactly non-classified new data in unbalanced data stream is carried out to real-time grading, use the present invention can predict comparatively exactly the record of minority class, can greatly reduce again sorter and most classes are mistaken for to the probability of minority class; Therefore, the method in the present invention, the classification of the data stream weighing for skewness, has great importance; And how the present invention can, for solving in data stream application, find classifying rules and comparatively exactly non-classified new data be carried out the problem of real-time grading from unbalanced real-time stream; The method belongs to artificial intelligence application field, can be widely used in the application such as area of computer aided clinical diagnosis, intrusion detection in real time.
Accompanying drawing explanation
Fig. 1 is schematic flow sheet of the present invention.
Embodiment
As shown in Figure 1, the integrated unbalanced data flow classification algorithm of a kind of two degree, is used pane data flow model, and by large data stream record blocks such as data stream are cut into successively, the data recording quantity of each data stream record block is identical.The parameter of using in this patent mainly contains: b: the quantity of the data recording in data stream record block.S: the sample size of sampling, s < b, s is also the size that minority class records container simultaneously.N: the quantity of the data stream record block that pane data flow model can keep.Specifically comprise following step:
A: equalization data traffic classification model and lack of balance data flow classification model training stage: for each up-to-date data stream record block, ratio with 90% and 10% is divided into training dataset Tr and verification msg collection Va two parts by this data stream record block, trains respectively 1 equalization data traffic classification model and 1 lack of balance data flow classification model on Tr;
In described steps A, train 1 equalization data traffic classification model to comprise following step:
A11: Tr is carried out to simple random sampling, and sample size is s, does not distinguish the classification of sample during sampling, and this sample is designated as T1;
A12: use sorting algorithm, train classification models on T1, claims that this disaggregated model is 1 equalization data traffic classification model;
A13: test existing equalization data traffic classification model: if the sum of equalization data traffic classification model surpasses n, on Va, test has equalization data traffic classification model one by one, and the poorest equalization data traffic classification model of superseded classify accuracy, until the sum of residue equalization data traffic classification model equals n;
In described steps A, train 1 lack of balance data flow classification model to comprise following step:
A21: collect the minority class data recording in the training set of each data stream record block, and put into minority class and record container.If minority class records the sum of data recording in container, surpass defined amount s, eliminate the oldest data recording in this piece, until the sum of remaining data record equals s;
A22: during sampling, first Tr is carried out to simple random sampling, sample size is s/2, does not distinguish the classification of sample during sampling; Then data recording minority class being recorded in container is carried out simple random sampling, and sample size is also s/2, and twice data from the sample survey combined and form up-to-date data from the sample survey, is designated as T2;
A23: use sorting algorithm, train classification models on T2.Claim that this disaggregated model is 1 lack of balance data flow classification model;
A24: test existing lack of balance data flow classification model: if the sum of lack of balance data flow classification model surpasses n, on Va, test has lack of balance data flow classification model one by one, and the poorest lack of balance data flow classification model of superseded classify accuracy, until the sum of remaining lack of balance data flow classification model equals n.
B: utilize n equalization data traffic classification model and n lack of balance data flow classification model in steps A the data recording in Va is classified and carry out reliability assessment, draw the confidence level threshold value δ of optimization.
E1 is n the integrated sorter of equalization data traffic classification model in steps A, and E2 is n the integrated sorter of lack of balance data flow classification model in steps A.E1 and E2 are used the classification of a data recording of method prediction of member's majority voting.
The fall into a trap method of point counting class credible result degree of described step B is as follows:
B1: for binary classification device, the value of definition r (x) is the absolute value of the difference of a data recording x of sorter prediction probability that belongs to two classes; With P (x ∈ A), represent that x belongs to the probability of class A, with P (x ∈ B) expression x, belong to the probability of class B, r (x)=| P (x ∈ A)-P (x ∈ B) |, P (x ∈ A)+P (x ∈ B)=1 wherein; Wherein the value of r (x) is larger, just shows that the confidence level of classification results of binary classification device is higher; Otherwise, if the value of r (x) is less, just show that the confidence level of classification results of binary classification device is lower.
The method of calculating confidence level threshold value δ in described step B is as follows:
B2: the method that usage data drives is determined the confidence level threshold value of optimizing: with the accuracy of m1 presentation class, the sensitivity of m2 presentation class and the geometric mean of specificity; Initializing variable d=1.0, t=0.On verification msg collection Va in steps A, following operation is carried out in circulation: since 0, the value of δ is increased to 0.02 at every turn, and verify the value of the point (m1, m2) that this δ value is corresponding and the distance of point (1,1); In each circulation, retain from (m1, the m2) of the distance minimum of point (1,1) and put corresponding δ value, this circular flow o'clock finishes to δ=1.Specific procedure is as follows:
The method of data-driven:
Input: verification msg collection Va, the n in steps A the sorter E1 that equalization data traffic classification model is integrated, the n in steps A the sorter E2 that lack of balance data flow classification model is integrated
Output: the optimum value of parameter δ
begin
1 t
Figure 103078DEST_PATH_IMAGE001
0, d 1.0;
2 for t=0:0.02:1 { // circulation (t span is in [0,1], and each circulation increases progressively 0.02)
3 for each u in Va { // circulation (u is a data recording in Va)
First 4 use sorter E1 to classify to u;
Then 5 calculate r (u) according to classification results;
6 if ( r(u) < t) {
7 use sorter E2 to reclassify u;
8 calculate (m1, m2) and calculate it to the distance l of point (1,1);
9 if ( l < d) {
10 δ = t;
11 d = l; }}}
end
After circulation finishes, the currency of t is assigned to δ, δ value is now the confidence level threshold value of optimization.
C: to the prediction of classifying of every data recording in test data set Test.
In described step C, any data recording u in test data set Test is classified and comprises following steps:
C1: first use the sorter E1 in step B to classify to u;
C2: use the confidence level computing method in step B1 to calculate r (u);
C3: if r (u) >=is δ, output category result; If r (u) < is δ, uses the sorter E2 in step B to carry out subseries again to u, and export the classification results of E2.
The present invention is the model prediction of equalization data traffic classification, classifying believe degree assessment and unbalanced data flow classification model prediction three phases by the classifying and dividing of unbalanced data stream.Wherein, the equalization data traffic classification model prediction stage is used the classification that the integrated sorter E1 predicted data of n balanced sorter in step B records; Classifying believe degree assessment is carried out reliability assessment to the classification results of E1, and the classification results of record with a high credibility directly returns to user, and does not need the classification through unbalanced data flow classification model prediction.And record with a low credibility need to be integrated through n lack of balance sorter in step B sorter E2 subseries again and export the classification results of E2.
Overall flow of the present invention is as follows:
Total algorithm:
Input: the training set Train of data stream, the test set Test of data stream
Output: the classification results of Test data set
begin
1 Train is divided into n size is the data stream record block D1 of b, D2 ..., Dn;
2 for i=1:1:n { // circulation (i span is at [1, n], and each circulation increases progressively 1)
3 are divided into training set Tr and checking collection Va by data stream record block Di;
4 use the method in steps A on Tr, to train 1 balanced sorter and 1 lack of balance sorter;
5 are retained in n balanced sorter and n the lack of balance sorter of the upper classifying quality the best of Va;
6 use algorithm 1 to solve optimal threshold δ on Va;
7 for each u in Test { // circulation (u is a data recording in Test)
First 8 use by n the integrated sorter E1 of balanced sorter u classified;
Then 9 calculate r (u) according to E1 classification results;
10 if ( r(u) < δ) {
11 use by the integrated sorter E2 of n lack of balance sorter u subseries again;
end

Claims (5)

1. the integrated unbalanced data flow classification algorithm of two degree, is characterized in that: comprise following step:
A: equalization data traffic classification model and lack of balance data flow classification model training stage: each up-to-date data stream record block of concentrating for training data, is divided into training set and checking collection; On training set, train respectively the disaggregated model of a balanced disaggregated model and a lack of balance; Be retained in the disaggregated model of n the equilibrium that the upper classify accuracy of checking collection is the highest and the disaggregated model of n lack of balance;
B: utilize n equalization data traffic classification model and n lack of balance data flow classification model in steps A classify and carry out reliability assessment verifying concentrated data recording, finally draw the confidence level threshold value δ of optimization;
C: use n equalization data traffic classification model and n lack of balance data flow classification model in steps A to classify for each concentrated data recording of test data, and export final classification results.
2. spend integrated unbalanced data flow classification algorithm for according to claim 1 pair, it is characterized in that: the confidence level threshold value δ of the method that in described step B, usage data drives definite optimization on checking collection, concrete grammar is as follows:
With the accuracy of m1 presentation class, the sensitivity of m2 presentation class and the geometric mean of specificity; Initializing variable d=1.0, t=0, on verification msg collection; Circulation is carried out as is finished drilling: since 0, the value of δ is increased to 0.02 at every turn, and verify the value of the point (m1, m2) that this δ value is corresponding and the distance l of point (1,1); If this l is also less than d, d=l, t=δ; This circular flow o'clock finishes to δ=1; After circulation finishes, the currency of t is assigned to δ, δ value is now the confidence level threshold value of optimization.
3. spend integrated unbalanced data flow classification algorithm for according to claim 1 pair, its feature exists
In: every the data recording u in described step C, test data being concentrated, classify and predicts and comprise following step:
C1: first integrated retained a n equalization data traffic classification model to the u prediction of classifying;
C2: calculate the confidence level r of the classification results of u (u), the classification results that confidence level r (u) is greater than the confidence level threshold value δ of optimization directly returns to user;
C3: if to the low r of the classifying believe degree of u (u) and the confidence level threshold value δ optimizing, the disaggregated model of an integrated n lack of balance carries out subseries again to u, and returns to final classification results.
4. according to the integrated unbalanced data flow classification algorithm of two degree described in claim 1-3, its feature exists
In: in described steps A, train equalization data traffic classification model to comprise following step:
A11: training set is carried out to simple random sampling, and sample size, for being designated as s, is not distinguished the classification of sample during sampling, and this sample is designated as T1;
A12: use sorting algorithm, train classification models on T1, claims that this disaggregated model is 1 equalization data traffic classification model;
A13: test existing equalization data traffic classification model, if the sum of equalization data traffic classification model surpasses n, on checking collection, test has equalization data traffic classification model one by one, and the poorest equalization data traffic classification model of superseded classify accuracy, until the sum of residue equalization data traffic classification model equals n.
5. spend integrated unbalanced data flow classification algorithm for according to claim 4 pair, its feature exists
In: in described steps A, train 1 lack of balance data flow classification model to comprise following step:
A21: collect the minority class data recording in the training set of each data stream record block, and put into minority class and record container, if minority class records the sum of data recording in container, surpass defined amount s, eliminate the oldest data recording in this piece, until the sum of remaining data record equals s;
A22: during sampling, first Tr is carried out to simple random sampling, sample size is s/2, does not distinguish the classification of sample during sampling; Then data recording minority class being recorded in container is carried out simple random sampling, and sample size is also s/2, and twice data from the sample survey combined and form up-to-date data from the sample survey, is designated as T2;
A23: use sorting algorithm, train classification models on T2, claims that this disaggregated model is 1 lack of balance data flow classification model;
A24: test existing lack of balance data flow classification model: if the sum of lack of balance data flow classification model surpasses n, on Va, test has lack of balance data flow classification model one by one, and the poorest lack of balance data flow classification model of superseded classify accuracy, until the sum of remaining lack of balance data flow classification model equals n.
CN201310624425.5A 2013-11-29 2013-11-29 The integrated unbalanced data flow classification algorithm of a kind of two degree Active CN103593470B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310624425.5A CN103593470B (en) 2013-11-29 2013-11-29 The integrated unbalanced data flow classification algorithm of a kind of two degree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310624425.5A CN103593470B (en) 2013-11-29 2013-11-29 The integrated unbalanced data flow classification algorithm of a kind of two degree

Publications (2)

Publication Number Publication Date
CN103593470A true CN103593470A (en) 2014-02-19
CN103593470B CN103593470B (en) 2016-05-18

Family

ID=50083611

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310624425.5A Active CN103593470B (en) 2013-11-29 2013-11-29 The integrated unbalanced data flow classification algorithm of a kind of two degree

Country Status (1)

Country Link
CN (1) CN103593470B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462301A (en) * 2014-11-28 2015-03-25 北京奇虎科技有限公司 Network data processing method and device
CN106294490A (en) * 2015-06-08 2017-01-04 富士通株式会社 The feature Enhancement Method of data sample and device and classifier training method and apparatus
CN107423156A (en) * 2017-07-29 2017-12-01 合肥千奴信息科技有限公司 Fault pre-alarming algorithm based on taxonomic clustering
CN108141377A (en) * 2015-10-12 2018-06-08 华为技术有限公司 Network flow early stage classifies
CN110245232A (en) * 2019-06-03 2019-09-17 网易传媒科技(北京)有限公司 File classification method, device, medium and calculating equipment
WO2020220220A1 (en) * 2019-04-29 2020-11-05 西门子(中国)有限公司 Classification model training method and device, and computer-readable medium
CN111915559A (en) * 2020-06-30 2020-11-10 电子科技大学 Airborne SAR image quality evaluation method based on SVM classification credibility
CN112017634A (en) * 2020-08-06 2020-12-01 Oppo(重庆)智能科技有限公司 Data processing method, device, equipment and storage medium
CN112989207A (en) * 2021-04-27 2021-06-18 武汉卓尔数字传媒科技有限公司 Information recommendation method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101587493A (en) * 2009-06-29 2009-11-25 中国科学技术大学 Text classification method
CN101763466A (en) * 2010-01-20 2010-06-30 西安电子科技大学 Biological information recognition method based on dynamic sample selection integration
CN102945280A (en) * 2012-11-15 2013-02-27 翟云 Unbalanced data distribution-based multi-heterogeneous base classifier fusion classification method
CN103309953A (en) * 2013-05-24 2013-09-18 合肥工业大学 Method for labeling and searching for diversified pictures based on integration of multiple RBFNN classifiers

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101587493A (en) * 2009-06-29 2009-11-25 中国科学技术大学 Text classification method
CN101763466A (en) * 2010-01-20 2010-06-30 西安电子科技大学 Biological information recognition method based on dynamic sample selection integration
CN102945280A (en) * 2012-11-15 2013-02-27 翟云 Unbalanced data distribution-based multi-heterogeneous base classifier fusion classification method
CN103309953A (en) * 2013-05-24 2013-09-18 合肥工业大学 Method for labeling and searching for diversified pictures based on integration of multiple RBFNN classifiers

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
欧阳震诤,罗建书,胡东敏,吴泉源: "一种不平衡数据流集成分类模型", 《电子学报》 *
王和勇,樊泓坤,姚正安,李成安: "不平衡数据集的分类方法研究", 《计算机应用研究》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462301B (en) * 2014-11-28 2018-05-04 北京奇虎科技有限公司 A kind for the treatment of method and apparatus of network data
CN104462301A (en) * 2014-11-28 2015-03-25 北京奇虎科技有限公司 Network data processing method and device
CN106294490B (en) * 2015-06-08 2019-12-24 富士通株式会社 Feature enhancement method and device for data sample and classifier training method and device
CN106294490A (en) * 2015-06-08 2017-01-04 富士通株式会社 The feature Enhancement Method of data sample and device and classifier training method and apparatus
CN108141377A (en) * 2015-10-12 2018-06-08 华为技术有限公司 Network flow early stage classifies
CN108141377B (en) * 2015-10-12 2020-08-07 华为技术有限公司 Early classification of network flows
CN107423156A (en) * 2017-07-29 2017-12-01 合肥千奴信息科技有限公司 Fault pre-alarming algorithm based on taxonomic clustering
WO2020220220A1 (en) * 2019-04-29 2020-11-05 西门子(中国)有限公司 Classification model training method and device, and computer-readable medium
CN110245232A (en) * 2019-06-03 2019-09-17 网易传媒科技(北京)有限公司 File classification method, device, medium and calculating equipment
CN110245232B (en) * 2019-06-03 2022-02-18 网易传媒科技(北京)有限公司 Text classification method, device, medium and computing equipment
CN111915559A (en) * 2020-06-30 2020-11-10 电子科技大学 Airborne SAR image quality evaluation method based on SVM classification credibility
CN111915559B (en) * 2020-06-30 2022-09-20 电子科技大学 Airborne SAR image quality evaluation method based on SVM classification credibility
CN112017634A (en) * 2020-08-06 2020-12-01 Oppo(重庆)智能科技有限公司 Data processing method, device, equipment and storage medium
CN112989207A (en) * 2021-04-27 2021-06-18 武汉卓尔数字传媒科技有限公司 Information recommendation method and device, electronic equipment and storage medium
CN112989207B (en) * 2021-04-27 2021-08-27 武汉卓尔数字传媒科技有限公司 Information recommendation method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN103593470B (en) 2016-05-18

Similar Documents

Publication Publication Date Title
CN103593470B (en) The integrated unbalanced data flow classification algorithm of a kind of two degree
CN107633265B (en) Data processing method and device for optimizing credit evaluation model
CN110852856B (en) Invoice false invoice identification method based on dynamic network representation
CN109034194B (en) Transaction fraud behavior deep detection method based on feature differentiation
WO2019218699A1 (en) Fraud transaction determining method and apparatus, computer device, and storage medium
Ponti et al. A decision cognizant Kullback–Leibler divergence
WO2017143919A1 (en) Method and apparatus for establishing data identification model
WO2020220758A1 (en) Method for detecting abnormal transaction node, and device
CN107679734A (en) It is a kind of to be used for the method and system without label data classification prediction
CN104636449A (en) Distributed type big data system risk recognition method based on LSA-GCC
WO2020250730A1 (en) Fraud detection device, fraud detection method, and fraud detection program
Ghazal et al. Data Mining and Exploration: A Comparison Study among Data Mining Techniques on Iris Data Set
CN112465622A (en) Method, system, medium and computer equipment for checking enterprise comprehensive credit information
CN105426441B (en) A kind of automatic preprocess method of time series
CN106228190A (en) Decision tree method of discrimination for resident&#39;s exception water
WO2022143431A1 (en) Method and apparatus for training anti-money laundering model
Velden et al. Resolving author name homonymy to improve resolution of structures in co-author networks
CN107766500A (en) The auditing method of fixed assets card
CN114036531A (en) Multi-scale code measurement-based software security vulnerability detection method
Gao et al. Time Series Data Cleaning under Multi-Speed Constraints.
WO2020259391A1 (en) Database script performance testing method and device
CN106991171A (en) Topic based on Intelligent campus information service platform finds method
Rajeswari et al. A comparative evaluation of supervised and unsupervised methods for detecting outliers
CN106778252A (en) Intrusion detection method based on rough set theory Yu WAODE algorithms
CN110502669A (en) The unsupervised chart dendrography learning method of lightweight and device based on the side N DFS subgraph

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP02 Change in the address of a patent holder
CP02 Change in the address of a patent holder

Address after: 475001 Henan province city Minglun Street No. 85

Patentee after: Henan University

Address before: 475004 Jinming Avenue, Kaifeng City, Henan Province

Patentee before: Henan University