CN103593470A - Double-degree integrated unbalanced data stream classification algorithm - Google Patents
Double-degree integrated unbalanced data stream classification algorithm Download PDFInfo
- Publication number
- CN103593470A CN103593470A CN201310624425.5A CN201310624425A CN103593470A CN 103593470 A CN103593470 A CN 103593470A CN 201310624425 A CN201310624425 A CN 201310624425A CN 103593470 A CN103593470 A CN 103593470A
- Authority
- CN
- China
- Prior art keywords
- data
- classification model
- lack
- classification
- balance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a double-degree integrated unbalanced data stream classification algorithm. The double-degree integrated unbalanced data stream classification algorithm includes a balanced data stream classification model prediction stage, a classification reliability evaluation stage and an unbalanced data stream classification model prediction stage. In the balanced data stream classification model prediction stage, firstly, a balanced data stream classification model predicts the classification of each data record. In the classification reliability evaluation stage, reliability evaluation is conducted on the classification results obtained in the balanced data stream classification model prediction stage, the classification results of the records with high reliability are directly sent back to a user, and the data records with low reliability need to be classified again in the unbalanced data stream classification model prediction stage. The method embodied in the double-degree integrated unbalanced data stream classification algorithm can be widely applied to applications such as computer-assisted clinical diagnosis and real-time intrusion detection, and the invention belongs to the field of artificial intelligence applications.
Description
Technical field
The present invention relates to a kind of data flow classification algorithm, relate in particular to the integrated unbalanced data flow classification algorithm of a kind of two degree.
Background technology
In recent years, data mining technology is more and more in the practical application of all trades and professions, comprise area of computer aided clinical diagnosis, the commending system based on internet and ad system, client segmentation, finance data analysis and abnormal transaction monitoring etc., intellectual analysis and the decision system of this Industry-oriented are accepted extensively by people.
In a lot of practical applications, the distribution of data is unbalanced, claims again to distribute, and for example, 90% data recording belongs to classification A together, claims that A is most classes; And only have 10% data recording to belong to classification B, so claim that again B is minority class.For example, in the application of analyzing at finance data, most transaction are all normal, and it is abnormal only having only a few transaction; While using sorting technique to note abnormalities conclude the business regular, how from a small amount of abnormal transaction record, to note abnormalities transaction rule and set up abnormal classification of business transaction model, be the task of extremely having challenge: this disaggregated model needs to identify comparatively exactly abnormal transaction; Can not transaction be normally mistaken for abnormal simultaneously.In other words, this disaggregated model should be classified to abnormal transaction comparatively exactly, and normal transaction also needs to classify comparatively exactly.
The practical application of a lot of data minings not only needs to process static data, and need to process a large amount of flow datas, also be data stream, such as: social media excavate, the application such as flow analysis, stock exchange analysis, event detection, sensing data processing are clicked in website.In these application, the data stream of skewness weighing apparatus, the data stream tilting that also distributes is common.Although existing sorting algorithm can improve the classify accuracy of the minority class in the data stream of skewness weighing apparatus, reduced the classify accuracy of most classes.Therefore, need a kind of sorting algorithm of more desirable unbalanced data stream, this algorithm can be predicted the minority class data recording in unbalanced data stream comparatively exactly, can guarantee the classify accuracy to most class data recording again.
Summary of the invention
The object of this invention is to provide the integrated unbalanced data flow classification algorithm of a kind of two degree, can predict comparatively exactly the minority class in unbalanced data stream, can guarantee the classify accuracy to most class data recording again.
The present invention adopts following technical proposals:
Spend an integrated unbalanced data flow classification algorithm, comprise following step:
A: equalization data traffic classification model and lack of balance data flow classification model training stage: each up-to-date data stream record block of concentrating for training data, is divided into training set and checking collection; On training set, train respectively the disaggregated model of a balanced disaggregated model and a lack of balance; Be retained in the disaggregated model of n the equilibrium that the upper classify accuracy of checking collection is the highest and the disaggregated model of n lack of balance;
B: utilize n equalization data traffic classification model and n lack of balance data flow classification model in steps A classify and carry out reliability assessment verifying concentrated data recording, finally draw the confidence level threshold value δ of optimization;
C: use n equalization data traffic classification model and n lack of balance data flow classification model in steps A to classify for each concentrated data recording of test data, and export final classification results.
The method that in described step B, usage data drives is determined the confidence level threshold value δ optimizing on checking collection, and concrete grammar is as follows:
With the accuracy of m1 presentation class, the sensitivity of m2 presentation class and the geometric mean of specificity; Initializing variable d=1.0, t=0, on verification msg collection; Circulation is carried out as is finished drilling: since 0, the value of δ is increased to 0.02 at every turn, and verify the value of the point (m1, m2) that this δ value is corresponding and the distance l of point (1,1); If this l is also less than d, d=l, t=δ; This circular flow o'clock finishes to δ=1; After circulation finishes, the currency of t is assigned to δ, δ value is now the confidence level threshold value of optimization.
Every the data recording u in described step C, test data being concentrated classify and predicts and comprise following step:
C1: first integrated retained a n equalization data traffic classification model to the u prediction of classifying;
C2: calculate the confidence level r of the classification results of u (u), the classification results that confidence level r (u) is greater than the confidence level threshold value δ of optimization directly returns to user;
C3: if to the low r of the classifying believe degree of u (u) and the confidence level threshold value δ optimizing, the disaggregated model of an integrated n lack of balance carries out subseries again to u, and returns to final classification results.
In described steps A, train equalization data traffic classification model to comprise following step:
A11: training set is carried out to simple random sampling, and sample size, for being designated as s, is not distinguished the classification of sample during sampling, and this sample is designated as T1;
A12: use sorting algorithm, train classification models on T1, claims that this disaggregated model is 1 equalization data traffic classification model;
A13: test existing equalization data traffic classification model, if the sum of equalization data traffic classification model surpasses n, on checking collection, test has equalization data traffic classification model one by one, and the poorest equalization data traffic classification model of superseded classify accuracy, until the sum of residue equalization data traffic classification model equals n;
In described steps A, train 1 lack of balance data flow classification model to comprise following step:
A21: collect the minority class data recording in the training set of each data stream record block, and put into minority class and record container, if minority class records the sum of data recording in container, surpass defined amount s, eliminate the oldest data recording in this piece, until the sum of remaining data record equals s;
A22: during sampling, first Tr is carried out to simple random sampling, sample size is s/2, does not distinguish the classification of sample during sampling; Then data recording minority class being recorded in container is carried out simple random sampling, and sample size is also s/2, and twice data from the sample survey combined and form up-to-date data from the sample survey, is designated as T2;
A23: use sorting algorithm, train classification models on T2.Claim that this disaggregated model is 1 lack of balance data flow classification model;
A24: test existing lack of balance data flow classification model: if the sum of lack of balance data flow classification model surpasses n, on Va, test has lack of balance data flow classification model one by one, and the poorest lack of balance data flow classification model of superseded classify accuracy, until the sum of remaining lack of balance data flow classification model equals n.
The present invention is by using the model prediction of equalization data traffic classification, classifying believe degree assessment and unbalanced data flow classification model prediction three phases, comparatively exactly non-classified new data in unbalanced data stream is carried out to real-time grading, use the present invention can predict comparatively exactly the record of minority class, can greatly reduce again sorter and most classes are mistaken for to the probability of minority class; Therefore, the method in the present invention, the classification of the data stream weighing for skewness, has great importance; And how the present invention can, for solving in data stream application, find classifying rules and comparatively exactly non-classified new data be carried out the problem of real-time grading from unbalanced real-time stream; The method belongs to artificial intelligence application field, can be widely used in the application such as area of computer aided clinical diagnosis, intrusion detection in real time.
Accompanying drawing explanation
Fig. 1 is schematic flow sheet of the present invention.
Embodiment
As shown in Figure 1, the integrated unbalanced data flow classification algorithm of a kind of two degree, is used pane data flow model, and by large data stream record blocks such as data stream are cut into successively, the data recording quantity of each data stream record block is identical.The parameter of using in this patent mainly contains: b: the quantity of the data recording in data stream record block.S: the sample size of sampling, s < b, s is also the size that minority class records container simultaneously.N: the quantity of the data stream record block that pane data flow model can keep.Specifically comprise following step:
A: equalization data traffic classification model and lack of balance data flow classification model training stage: for each up-to-date data stream record block, ratio with 90% and 10% is divided into training dataset Tr and verification msg collection Va two parts by this data stream record block, trains respectively 1 equalization data traffic classification model and 1 lack of balance data flow classification model on Tr;
In described steps A, train 1 equalization data traffic classification model to comprise following step:
A11: Tr is carried out to simple random sampling, and sample size is s, does not distinguish the classification of sample during sampling, and this sample is designated as T1;
A12: use sorting algorithm, train classification models on T1, claims that this disaggregated model is 1 equalization data traffic classification model;
A13: test existing equalization data traffic classification model: if the sum of equalization data traffic classification model surpasses n, on Va, test has equalization data traffic classification model one by one, and the poorest equalization data traffic classification model of superseded classify accuracy, until the sum of residue equalization data traffic classification model equals n;
In described steps A, train 1 lack of balance data flow classification model to comprise following step:
A21: collect the minority class data recording in the training set of each data stream record block, and put into minority class and record container.If minority class records the sum of data recording in container, surpass defined amount s, eliminate the oldest data recording in this piece, until the sum of remaining data record equals s;
A22: during sampling, first Tr is carried out to simple random sampling, sample size is s/2, does not distinguish the classification of sample during sampling; Then data recording minority class being recorded in container is carried out simple random sampling, and sample size is also s/2, and twice data from the sample survey combined and form up-to-date data from the sample survey, is designated as T2;
A23: use sorting algorithm, train classification models on T2.Claim that this disaggregated model is 1 lack of balance data flow classification model;
A24: test existing lack of balance data flow classification model: if the sum of lack of balance data flow classification model surpasses n, on Va, test has lack of balance data flow classification model one by one, and the poorest lack of balance data flow classification model of superseded classify accuracy, until the sum of remaining lack of balance data flow classification model equals n.
B: utilize n equalization data traffic classification model and n lack of balance data flow classification model in steps A the data recording in Va is classified and carry out reliability assessment, draw the confidence level threshold value δ of optimization.
E1 is n the integrated sorter of equalization data traffic classification model in steps A, and E2 is n the integrated sorter of lack of balance data flow classification model in steps A.E1 and E2 are used the classification of a data recording of method prediction of member's majority voting.
The fall into a trap method of point counting class credible result degree of described step B is as follows:
B1: for binary classification device, the value of definition r (x) is the absolute value of the difference of a data recording x of sorter prediction probability that belongs to two classes; With P (x ∈ A), represent that x belongs to the probability of class A, with P (x ∈ B) expression x, belong to the probability of class B, r (x)=| P (x ∈ A)-P (x ∈ B) |, P (x ∈ A)+P (x ∈ B)=1 wherein; Wherein the value of r (x) is larger, just shows that the confidence level of classification results of binary classification device is higher; Otherwise, if the value of r (x) is less, just show that the confidence level of classification results of binary classification device is lower.
The method of calculating confidence level threshold value δ in described step B is as follows:
B2: the method that usage data drives is determined the confidence level threshold value of optimizing: with the accuracy of m1 presentation class, the sensitivity of m2 presentation class and the geometric mean of specificity; Initializing variable d=1.0, t=0.On verification msg collection Va in steps A, following operation is carried out in circulation: since 0, the value of δ is increased to 0.02 at every turn, and verify the value of the point (m1, m2) that this δ value is corresponding and the distance of point (1,1); In each circulation, retain from (m1, the m2) of the distance minimum of point (1,1) and put corresponding δ value, this circular flow o'clock finishes to δ=1.Specific procedure is as follows:
The method of data-driven:
Input: verification msg collection Va, the n in steps A the sorter E1 that equalization data traffic classification model is integrated, the n in steps A the sorter E2 that lack of balance data flow classification model is integrated
Output: the optimum value of parameter δ
begin
2 for t=0:0.02:1 { // circulation (t span is in [0,1], and each circulation increases progressively 0.02)
3 for each u in Va { // circulation (u is a data recording in Va)
First 4 use sorter E1 to classify to u;
Then 5 calculate r (u) according to classification results;
6 if ( r(u) < t) {
7 use sorter E2 to reclassify u;
8 calculate (m1, m2) and calculate it to the distance l of point (1,1);
9 if ( l < d) {
10 δ = t;
11 d = l; }}}
end
After circulation finishes, the currency of t is assigned to δ, δ value is now the confidence level threshold value of optimization.
C: to the prediction of classifying of every data recording in test data set Test.
In described step C, any data recording u in test data set Test is classified and comprises following steps:
C1: first use the sorter E1 in step B to classify to u;
C2: use the confidence level computing method in step B1 to calculate r (u);
C3: if r (u) >=is δ, output category result; If r (u) < is δ, uses the sorter E2 in step B to carry out subseries again to u, and export the classification results of E2.
The present invention is the model prediction of equalization data traffic classification, classifying believe degree assessment and unbalanced data flow classification model prediction three phases by the classifying and dividing of unbalanced data stream.Wherein, the equalization data traffic classification model prediction stage is used the classification that the integrated sorter E1 predicted data of n balanced sorter in step B records; Classifying believe degree assessment is carried out reliability assessment to the classification results of E1, and the classification results of record with a high credibility directly returns to user, and does not need the classification through unbalanced data flow classification model prediction.And record with a low credibility need to be integrated through n lack of balance sorter in step B sorter E2 subseries again and export the classification results of E2.
Overall flow of the present invention is as follows:
Total algorithm:
Input: the training set Train of data stream, the test set Test of data stream
Output: the classification results of Test data set
begin
1 Train is divided into n size is the data stream record block D1 of b, D2 ..., Dn;
2 for i=1:1:n { // circulation (i span is at [1, n], and each circulation increases progressively 1)
3 are divided into training set Tr and checking collection Va by data stream record block Di;
4 use the method in steps A on Tr, to train 1 balanced sorter and 1 lack of balance sorter;
5 are retained in n balanced sorter and n the lack of balance sorter of the upper classifying quality the best of Va;
6 use algorithm 1 to solve optimal threshold δ on Va;
7 for each u in Test { // circulation (u is a data recording in Test)
First 8 use by n the integrated sorter E1 of balanced sorter u classified;
Then 9 calculate r (u) according to E1 classification results;
10 if ( r(u) < δ) {
11 use by the integrated sorter E2 of n lack of balance sorter u subseries again;
end
Claims (5)
1. the integrated unbalanced data flow classification algorithm of two degree, is characterized in that: comprise following step:
A: equalization data traffic classification model and lack of balance data flow classification model training stage: each up-to-date data stream record block of concentrating for training data, is divided into training set and checking collection; On training set, train respectively the disaggregated model of a balanced disaggregated model and a lack of balance; Be retained in the disaggregated model of n the equilibrium that the upper classify accuracy of checking collection is the highest and the disaggregated model of n lack of balance;
B: utilize n equalization data traffic classification model and n lack of balance data flow classification model in steps A classify and carry out reliability assessment verifying concentrated data recording, finally draw the confidence level threshold value δ of optimization;
C: use n equalization data traffic classification model and n lack of balance data flow classification model in steps A to classify for each concentrated data recording of test data, and export final classification results.
2. spend integrated unbalanced data flow classification algorithm for according to claim 1 pair, it is characterized in that: the confidence level threshold value δ of the method that in described step B, usage data drives definite optimization on checking collection, concrete grammar is as follows:
With the accuracy of m1 presentation class, the sensitivity of m2 presentation class and the geometric mean of specificity; Initializing variable d=1.0, t=0, on verification msg collection; Circulation is carried out as is finished drilling: since 0, the value of δ is increased to 0.02 at every turn, and verify the value of the point (m1, m2) that this δ value is corresponding and the distance l of point (1,1); If this l is also less than d, d=l, t=δ; This circular flow o'clock finishes to δ=1; After circulation finishes, the currency of t is assigned to δ, δ value is now the confidence level threshold value of optimization.
3. spend integrated unbalanced data flow classification algorithm for according to claim 1 pair, its feature exists
In: every the data recording u in described step C, test data being concentrated, classify and predicts and comprise following step:
C1: first integrated retained a n equalization data traffic classification model to the u prediction of classifying;
C2: calculate the confidence level r of the classification results of u (u), the classification results that confidence level r (u) is greater than the confidence level threshold value δ of optimization directly returns to user;
C3: if to the low r of the classifying believe degree of u (u) and the confidence level threshold value δ optimizing, the disaggregated model of an integrated n lack of balance carries out subseries again to u, and returns to final classification results.
4. according to the integrated unbalanced data flow classification algorithm of two degree described in claim 1-3, its feature exists
In: in described steps A, train equalization data traffic classification model to comprise following step:
A11: training set is carried out to simple random sampling, and sample size, for being designated as s, is not distinguished the classification of sample during sampling, and this sample is designated as T1;
A12: use sorting algorithm, train classification models on T1, claims that this disaggregated model is 1 equalization data traffic classification model;
A13: test existing equalization data traffic classification model, if the sum of equalization data traffic classification model surpasses n, on checking collection, test has equalization data traffic classification model one by one, and the poorest equalization data traffic classification model of superseded classify accuracy, until the sum of residue equalization data traffic classification model equals n.
5. spend integrated unbalanced data flow classification algorithm for according to claim 4 pair, its feature exists
In: in described steps A, train 1 lack of balance data flow classification model to comprise following step:
A21: collect the minority class data recording in the training set of each data stream record block, and put into minority class and record container, if minority class records the sum of data recording in container, surpass defined amount s, eliminate the oldest data recording in this piece, until the sum of remaining data record equals s;
A22: during sampling, first Tr is carried out to simple random sampling, sample size is s/2, does not distinguish the classification of sample during sampling; Then data recording minority class being recorded in container is carried out simple random sampling, and sample size is also s/2, and twice data from the sample survey combined and form up-to-date data from the sample survey, is designated as T2;
A23: use sorting algorithm, train classification models on T2, claims that this disaggregated model is 1 lack of balance data flow classification model;
A24: test existing lack of balance data flow classification model: if the sum of lack of balance data flow classification model surpasses n, on Va, test has lack of balance data flow classification model one by one, and the poorest lack of balance data flow classification model of superseded classify accuracy, until the sum of remaining lack of balance data flow classification model equals n.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310624425.5A CN103593470B (en) | 2013-11-29 | 2013-11-29 | The integrated unbalanced data flow classification algorithm of a kind of two degree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310624425.5A CN103593470B (en) | 2013-11-29 | 2013-11-29 | The integrated unbalanced data flow classification algorithm of a kind of two degree |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103593470A true CN103593470A (en) | 2014-02-19 |
CN103593470B CN103593470B (en) | 2016-05-18 |
Family
ID=50083611
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310624425.5A Active CN103593470B (en) | 2013-11-29 | 2013-11-29 | The integrated unbalanced data flow classification algorithm of a kind of two degree |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103593470B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104462301A (en) * | 2014-11-28 | 2015-03-25 | 北京奇虎科技有限公司 | Network data processing method and device |
CN106294490A (en) * | 2015-06-08 | 2017-01-04 | 富士通株式会社 | The feature Enhancement Method of data sample and device and classifier training method and apparatus |
CN107423156A (en) * | 2017-07-29 | 2017-12-01 | 合肥千奴信息科技有限公司 | Fault pre-alarming algorithm based on taxonomic clustering |
CN108141377A (en) * | 2015-10-12 | 2018-06-08 | 华为技术有限公司 | Network flow early stage classifies |
CN110245232A (en) * | 2019-06-03 | 2019-09-17 | 网易传媒科技(北京)有限公司 | File classification method, device, medium and calculating equipment |
WO2020220220A1 (en) * | 2019-04-29 | 2020-11-05 | 西门子(中国)有限公司 | Classification model training method and device, and computer-readable medium |
CN111915559A (en) * | 2020-06-30 | 2020-11-10 | 电子科技大学 | Airborne SAR image quality evaluation method based on SVM classification credibility |
CN112017634A (en) * | 2020-08-06 | 2020-12-01 | Oppo(重庆)智能科技有限公司 | Data processing method, device, equipment and storage medium |
CN112989207A (en) * | 2021-04-27 | 2021-06-18 | 武汉卓尔数字传媒科技有限公司 | Information recommendation method and device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101587493A (en) * | 2009-06-29 | 2009-11-25 | 中国科学技术大学 | Text classification method |
CN101763466A (en) * | 2010-01-20 | 2010-06-30 | 西安电子科技大学 | Biological information recognition method based on dynamic sample selection integration |
CN102945280A (en) * | 2012-11-15 | 2013-02-27 | 翟云 | Unbalanced data distribution-based multi-heterogeneous base classifier fusion classification method |
CN103309953A (en) * | 2013-05-24 | 2013-09-18 | 合肥工业大学 | Method for labeling and searching for diversified pictures based on integration of multiple RBFNN classifiers |
-
2013
- 2013-11-29 CN CN201310624425.5A patent/CN103593470B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101587493A (en) * | 2009-06-29 | 2009-11-25 | 中国科学技术大学 | Text classification method |
CN101763466A (en) * | 2010-01-20 | 2010-06-30 | 西安电子科技大学 | Biological information recognition method based on dynamic sample selection integration |
CN102945280A (en) * | 2012-11-15 | 2013-02-27 | 翟云 | Unbalanced data distribution-based multi-heterogeneous base classifier fusion classification method |
CN103309953A (en) * | 2013-05-24 | 2013-09-18 | 合肥工业大学 | Method for labeling and searching for diversified pictures based on integration of multiple RBFNN classifiers |
Non-Patent Citations (2)
Title |
---|
欧阳震诤,罗建书,胡东敏,吴泉源: "一种不平衡数据流集成分类模型", 《电子学报》 * |
王和勇,樊泓坤,姚正安,李成安: "不平衡数据集的分类方法研究", 《计算机应用研究》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104462301B (en) * | 2014-11-28 | 2018-05-04 | 北京奇虎科技有限公司 | A kind for the treatment of method and apparatus of network data |
CN104462301A (en) * | 2014-11-28 | 2015-03-25 | 北京奇虎科技有限公司 | Network data processing method and device |
CN106294490B (en) * | 2015-06-08 | 2019-12-24 | 富士通株式会社 | Feature enhancement method and device for data sample and classifier training method and device |
CN106294490A (en) * | 2015-06-08 | 2017-01-04 | 富士通株式会社 | The feature Enhancement Method of data sample and device and classifier training method and apparatus |
CN108141377A (en) * | 2015-10-12 | 2018-06-08 | 华为技术有限公司 | Network flow early stage classifies |
CN108141377B (en) * | 2015-10-12 | 2020-08-07 | 华为技术有限公司 | Early classification of network flows |
CN107423156A (en) * | 2017-07-29 | 2017-12-01 | 合肥千奴信息科技有限公司 | Fault pre-alarming algorithm based on taxonomic clustering |
WO2020220220A1 (en) * | 2019-04-29 | 2020-11-05 | 西门子(中国)有限公司 | Classification model training method and device, and computer-readable medium |
CN110245232A (en) * | 2019-06-03 | 2019-09-17 | 网易传媒科技(北京)有限公司 | File classification method, device, medium and calculating equipment |
CN110245232B (en) * | 2019-06-03 | 2022-02-18 | 网易传媒科技(北京)有限公司 | Text classification method, device, medium and computing equipment |
CN111915559A (en) * | 2020-06-30 | 2020-11-10 | 电子科技大学 | Airborne SAR image quality evaluation method based on SVM classification credibility |
CN111915559B (en) * | 2020-06-30 | 2022-09-20 | 电子科技大学 | Airborne SAR image quality evaluation method based on SVM classification credibility |
CN112017634A (en) * | 2020-08-06 | 2020-12-01 | Oppo(重庆)智能科技有限公司 | Data processing method, device, equipment and storage medium |
CN112989207A (en) * | 2021-04-27 | 2021-06-18 | 武汉卓尔数字传媒科技有限公司 | Information recommendation method and device, electronic equipment and storage medium |
CN112989207B (en) * | 2021-04-27 | 2021-08-27 | 武汉卓尔数字传媒科技有限公司 | Information recommendation method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN103593470B (en) | 2016-05-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103593470B (en) | The integrated unbalanced data flow classification algorithm of a kind of two degree | |
CN107633265B (en) | Data processing method and device for optimizing credit evaluation model | |
CN110852856B (en) | Invoice false invoice identification method based on dynamic network representation | |
CN109034194B (en) | Transaction fraud behavior deep detection method based on feature differentiation | |
WO2019218699A1 (en) | Fraud transaction determining method and apparatus, computer device, and storage medium | |
Ponti et al. | A decision cognizant Kullback–Leibler divergence | |
WO2017143919A1 (en) | Method and apparatus for establishing data identification model | |
WO2020220758A1 (en) | Method for detecting abnormal transaction node, and device | |
CN107679734A (en) | It is a kind of to be used for the method and system without label data classification prediction | |
CN104636449A (en) | Distributed type big data system risk recognition method based on LSA-GCC | |
WO2020250730A1 (en) | Fraud detection device, fraud detection method, and fraud detection program | |
Ghazal et al. | Data Mining and Exploration: A Comparison Study among Data Mining Techniques on Iris Data Set | |
CN112465622A (en) | Method, system, medium and computer equipment for checking enterprise comprehensive credit information | |
CN105426441B (en) | A kind of automatic preprocess method of time series | |
CN106228190A (en) | Decision tree method of discrimination for resident's exception water | |
WO2022143431A1 (en) | Method and apparatus for training anti-money laundering model | |
Velden et al. | Resolving author name homonymy to improve resolution of structures in co-author networks | |
CN107766500A (en) | The auditing method of fixed assets card | |
CN114036531A (en) | Multi-scale code measurement-based software security vulnerability detection method | |
Gao et al. | Time Series Data Cleaning under Multi-Speed Constraints. | |
WO2020259391A1 (en) | Database script performance testing method and device | |
CN106991171A (en) | Topic based on Intelligent campus information service platform finds method | |
Rajeswari et al. | A comparative evaluation of supervised and unsupervised methods for detecting outliers | |
CN106778252A (en) | Intrusion detection method based on rough set theory Yu WAODE algorithms | |
CN110502669A (en) | The unsupervised chart dendrography learning method of lightweight and device based on the side N DFS subgraph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CP02 | Change in the address of a patent holder | ||
CP02 | Change in the address of a patent holder |
Address after: 475001 Henan province city Minglun Street No. 85 Patentee after: Henan University Address before: 475004 Jinming Avenue, Kaifeng City, Henan Province Patentee before: Henan University |