CN113632112A - 增强的集成模型多样性和学习 - Google Patents

增强的集成模型多样性和学习 Download PDF

Info

Publication number
CN113632112A
CN113632112A CN202080022167.1A CN202080022167A CN113632112A CN 113632112 A CN113632112 A CN 113632112A CN 202080022167 A CN202080022167 A CN 202080022167A CN 113632112 A CN113632112 A CN 113632112A
Authority
CN
China
Prior art keywords
data points
clusters
class
minority
minority class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080022167.1A
Other languages
English (en)
Chinese (zh)
Inventor
S.萨特
D.S.图拉加
C.阿加瓦尔
V.N.帕武鲁里
张元极
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of CN113632112A publication Critical patent/CN113632112A/zh
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2115Selection of the most significant subset of features by evaluating different subsets according to an optimisation criterion, e.g. class separability, forward selection or backward elimination
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/259Fusion by voting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)
CN202080022167.1A 2019-04-11 2020-03-18 增强的集成模型多样性和学习 Pending CN113632112A (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US16/381,979 US11593716B2 (en) 2019-04-11 2019-04-11 Enhanced ensemble model diversity and learning
US16/381,979 2019-04-11
PCT/IB2020/052472 WO2020208445A1 (en) 2019-04-11 2020-03-18 Enhanced ensemble model diversity and learning

Publications (1)

Publication Number Publication Date
CN113632112A true CN113632112A (zh) 2021-11-09

Family

ID=72749268

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080022167.1A Pending CN113632112A (zh) 2019-04-11 2020-03-18 增强的集成模型多样性和学习

Country Status (5)

Country Link
US (1) US11593716B2 (https=)
JP (1) JP7335352B2 (https=)
CN (1) CN113632112A (https=)
GB (1) GB2598061A (https=)
WO (1) WO2020208445A1 (https=)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118057319A (zh) * 2022-11-21 2024-05-21 慧与发展有限责任合伙企业 使用模体和形状子的单变量时间序列数据集的无监督分割

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200342968A1 (en) 2019-04-24 2020-10-29 GE Precision Healthcare LLC Visualization of medical device event processing
US20230214665A1 (en) * 2020-04-17 2023-07-06 Siemens Aktiengesellschaft A neural network system for distributed boosting for a programmable logic controller with a plurality of processing units
US20210342707A1 (en) * 2020-05-01 2021-11-04 International Business Machines Corporation Data-driven techniques for model ensembles
US11418459B1 (en) * 2020-12-14 2022-08-16 Cigna Intellectual Property, Inc. Anomaly detection for packet loss
CN112801145B (zh) * 2021-01-12 2024-05-28 深圳市中博科创信息技术有限公司 安全监测方法、装置、计算机设备及存储介质
JP7322918B2 (ja) * 2021-03-29 2023-08-08 横河電機株式会社 プログラム、情報処理装置、及び学習モデルの生成方法
US12141806B2 (en) * 2021-05-30 2024-11-12 Actimize Ltd. Clustering-based data selection for optimization of risk predictive machine learning models
US12488063B2 (en) * 2021-09-01 2025-12-02 Unitedhealth Group Incorporated Generating input processing rules engines using probabilistic clustering techniques
US12118448B2 (en) 2021-10-20 2024-10-15 Visa International Service Association System, method, and computer program product for multi-domain ensemble learning based on multivariate time sequence data
US12541721B2 (en) * 2022-04-03 2026-02-03 Actimize Ltd. Method for extreme class imbalance within fraud detection
KR20240068162A (ko) * 2022-11-10 2024-05-17 삼성전자주식회사 이미지의 객체를 분류하는 분류 방법 및 이를 수행하는 분류 장치
JP2025049280A (ja) * 2023-09-21 2025-04-03 ソフトバンクグループ株式会社 システム
JP2025056785A (ja) * 2023-09-26 2025-04-08 ソフトバンクグループ株式会社 システム
JP7706730B1 (ja) * 2024-08-16 2025-07-14 AI inside株式会社 プログラム、方法、自律分散処理システム
WO2026038705A1 (ko) * 2024-08-16 2026-02-19 주식회사 Lg 경영개발원 예측 시스템 및 이의 제어 방법, 그리고 예측 시스템의 학습 방법
KR102831812B1 (ko) * 2024-08-16 2025-07-09 주식회사 Lg 경영개발원 예측 시스템 및 이의 제어 방법, 그리고 예측 시스템의 학습 방법

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239789A (zh) * 2017-05-09 2017-10-10 浙江大学 一种基于k‑means的不平衡数据工业故障分类方法
CN107688831A (zh) * 2017-09-04 2018-02-13 五邑大学 一种基于聚类下采样的不平衡数据分类方法
CN108985369A (zh) * 2018-07-06 2018-12-11 太原理工大学 一种用于非平衡数据集分类的同分布集成预测方法及系统
CN109086412A (zh) * 2018-08-03 2018-12-25 北京邮电大学 一种基于自适应加权Bagging-GBDT的不平衡数据分类方法
US20190019061A1 (en) * 2017-06-06 2019-01-17 Sightline Innovation Inc. System and method for increasing data quality in a machine learning process
CN109492673A (zh) * 2018-10-19 2019-03-19 南京理工大学 一种基于谱聚类采样的不平衡数据预测方法

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7127087B2 (en) * 2000-03-27 2006-10-24 Microsoft Corporation Pose-invariant face recognition system and process
WO2007115426A2 (en) * 2006-03-30 2007-10-18 Carestream Health, Inc. Smote algorithm with locally linear embedding
JP5142135B2 (ja) 2007-11-13 2013-02-13 インターナショナル・ビジネス・マシーンズ・コーポレーション データを分類する技術
US20130097103A1 (en) 2011-10-14 2013-04-18 International Business Machines Corporation Techniques for Generating Balanced and Class-Independent Training Data From Unlabeled Data Set
JP5733229B2 (ja) 2012-02-06 2015-06-10 新日鐵住金株式会社 分類器作成装置、分類器作成方法、及びコンピュータプログラム
US10515448B2 (en) 2016-09-20 2019-12-24 International Business Machines Corporation Handprint analysis to predict genetically based traits
US10956821B2 (en) 2016-11-29 2021-03-23 International Business Machines Corporation Accurate temporal event predictive modeling
US20180210944A1 (en) 2017-01-26 2018-07-26 Agt International Gmbh Data fusion and classification with imbalanced datasets
US11735317B2 (en) 2017-08-11 2023-08-22 Vuno, Inc. Method for generating prediction result for predicting occurrence of fatal symptoms of subject in advance and device using same
CN109032829B (zh) 2018-07-23 2020-12-08 腾讯科技(深圳)有限公司 数据异常检测方法、装置、计算机设备及存储介质
US11128667B2 (en) * 2018-11-29 2021-09-21 Rapid7, Inc. Cluster detection and elimination in security environments

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239789A (zh) * 2017-05-09 2017-10-10 浙江大学 一种基于k‑means的不平衡数据工业故障分类方法
US20190019061A1 (en) * 2017-06-06 2019-01-17 Sightline Innovation Inc. System and method for increasing data quality in a machine learning process
CN107688831A (zh) * 2017-09-04 2018-02-13 五邑大学 一种基于聚类下采样的不平衡数据分类方法
CN108985369A (zh) * 2018-07-06 2018-12-11 太原理工大学 一种用于非平衡数据集分类的同分布集成预测方法及系统
CN109086412A (zh) * 2018-08-03 2018-12-25 北京邮电大学 一种基于自适应加权Bagging-GBDT的不平衡数据分类方法
CN109492673A (zh) * 2018-10-19 2019-03-19 南京理工大学 一种基于谱聚类采样的不平衡数据预测方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118057319A (zh) * 2022-11-21 2024-05-21 慧与发展有限责任合伙企业 使用模体和形状子的单变量时间序列数据集的无监督分割
CN118057319B (zh) * 2022-11-21 2024-12-24 慧与发展有限责任合伙企业 使用模体和形状子的单变量时间序列数据集的无监督分割

Also Published As

Publication number Publication date
GB2598061A (en) 2022-02-16
WO2020208445A1 (en) 2020-10-15
US11593716B2 (en) 2023-02-28
GB202115645D0 (en) 2021-12-15
US20200327456A1 (en) 2020-10-15
JP7335352B2 (ja) 2023-08-29
JP2022527366A (ja) 2022-06-01

Similar Documents

Publication Publication Date Title
CN113632112A (zh) 增强的集成模型多样性和学习
US10956821B2 (en) Accurate temporal event predictive modeling
US11669757B2 (en) Operational energy consumption anomalies in intelligent energy consumption systems
US11682474B2 (en) Enhanced user screening for sensitive services
US11074913B2 (en) Understanding user sentiment using implicit user feedback in adaptive dialog systems
US20240346283A1 (en) Explainable classifications with abstention using client agnostic machine learning models
US11146580B2 (en) Script and command line exploitation detection
US11551817B2 (en) Assessing unreliability of clinical risk prediction
US11694815B2 (en) Intelligent ranking of sections of clinical practical guidelines
US11847132B2 (en) Visualization and exploration of probabilistic models
US11646116B2 (en) Intelligent identification of appropriate sections of clinical practical guideline
US20200250706A1 (en) Intelligent advertisement identification and interaction in an internet of things computing environment
US20240414064A1 (en) Self-learning automated information technology change risk prediction
US20230206114A1 (en) Fair selective classification via a variational mutual information upper bound for imposing sufficiency
US11200989B1 (en) Aperiodic data driven cognitive control system
US11556810B2 (en) Estimating feasibility and effort for a machine learning solution
US11568235B2 (en) Data driven mixed precision learning for neural networks
WO2023011618A1 (en) Predicting root cause of alert using recurrent neural network
US12153912B2 (en) Upgrading operating software (“OS”) for devices in a multi-device ecosystem
US12039011B2 (en) Intelligent expansion of reviewer feedback on training data
US12505165B1 (en) Apparatus and method for triadic user matching based on profile data
US20250094858A1 (en) Systems and methods for application monitoring
US20250259072A1 (en) Automated single-to-grouped cloud computing optimization
US11741123B2 (en) Visualization and exploration of probabilistic models for multiple instances
US12282848B2 (en) Estimated online hard negative mining via probabilistic selection and scores history consideration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20211109