GB2598061A - Enhanced ensemble model diversity and learning - Google Patents

Enhanced ensemble model diversity and learning Download PDF

Info

Publication number
GB2598061A
GB2598061A GB2115645.0A GB202115645A GB2598061A GB 2598061 A GB2598061 A GB 2598061A GB 202115645 A GB202115645 A GB 202115645A GB 2598061 A GB2598061 A GB 2598061A
Authority
GB
United Kingdom
Prior art keywords
data points
clusters
minority class
class
further including
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB2115645.0A
Other languages
English (en)
Other versions
GB202115645D0 (en
Inventor
Sathe Saket
Srinivas Turaga Deepak
Aggarwal Charu
Nagaraju Pavuluri Venkata
Chang Yuan-Chi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of GB202115645D0 publication Critical patent/GB202115645D0/en
Publication of GB2598061A publication Critical patent/GB2598061A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2115Selection of the most significant subset of features by evaluating different subsets according to an optimisation criterion, e.g. class separability, forward selection or backward elimination
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/259Fusion by voting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)
GB2115645.0A 2019-04-11 2020-03-18 Enhanced ensemble model diversity and learning Withdrawn GB2598061A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US16/381,979 US11593716B2 (en) 2019-04-11 2019-04-11 Enhanced ensemble model diversity and learning
PCT/IB2020/052472 WO2020208445A1 (en) 2019-04-11 2020-03-18 Enhanced ensemble model diversity and learning

Publications (2)

Publication Number Publication Date
GB202115645D0 GB202115645D0 (en) 2021-12-15
GB2598061A true GB2598061A (en) 2022-02-16

Family

ID=72749268

Family Applications (1)

Application Number Title Priority Date Filing Date
GB2115645.0A Withdrawn GB2598061A (en) 2019-04-11 2020-03-18 Enhanced ensemble model diversity and learning

Country Status (5)

Country Link
US (1) US11593716B2 (https=)
JP (1) JP7335352B2 (https=)
CN (1) CN113632112A (https=)
GB (1) GB2598061A (https=)
WO (1) WO2020208445A1 (https=)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200342968A1 (en) 2019-04-24 2020-10-29 GE Precision Healthcare LLC Visualization of medical device event processing
US20230214665A1 (en) * 2020-04-17 2023-07-06 Siemens Aktiengesellschaft A neural network system for distributed boosting for a programmable logic controller with a plurality of processing units
US20210342707A1 (en) * 2020-05-01 2021-11-04 International Business Machines Corporation Data-driven techniques for model ensembles
US11418459B1 (en) * 2020-12-14 2022-08-16 Cigna Intellectual Property, Inc. Anomaly detection for packet loss
CN112801145B (zh) * 2021-01-12 2024-05-28 深圳市中博科创信息技术有限公司 安全监测方法、装置、计算机设备及存储介质
JP7322918B2 (ja) * 2021-03-29 2023-08-08 横河電機株式会社 プログラム、情報処理装置、及び学習モデルの生成方法
US12141806B2 (en) * 2021-05-30 2024-11-12 Actimize Ltd. Clustering-based data selection for optimization of risk predictive machine learning models
US12488063B2 (en) * 2021-09-01 2025-12-02 Unitedhealth Group Incorporated Generating input processing rules engines using probabilistic clustering techniques
US12118448B2 (en) 2021-10-20 2024-10-15 Visa International Service Association System, method, and computer program product for multi-domain ensemble learning based on multivariate time sequence data
US12541721B2 (en) * 2022-04-03 2026-02-03 Actimize Ltd. Method for extreme class imbalance within fraud detection
KR20240068162A (ko) * 2022-11-10 2024-05-17 삼성전자주식회사 이미지의 객체를 분류하는 분류 방법 및 이를 수행하는 분류 장치
US12050626B2 (en) * 2022-11-21 2024-07-30 Hewlett Packard Enterprise Development Lp Unsupervised segmentation of a univariate time series dataset using motifs and shapelets
JP2025049280A (ja) * 2023-09-21 2025-04-03 ソフトバンクグループ株式会社 システム
JP2025056785A (ja) * 2023-09-26 2025-04-08 ソフトバンクグループ株式会社 システム
JP7706730B1 (ja) * 2024-08-16 2025-07-14 AI inside株式会社 プログラム、方法、自律分散処理システム
WO2026038705A1 (ko) * 2024-08-16 2026-02-19 주식회사 Lg 경영개발원 예측 시스템 및 이의 제어 방법, 그리고 예측 시스템의 학습 방법
KR102831812B1 (ko) * 2024-08-16 2025-07-09 주식회사 Lg 경영개발원 예측 시스템 및 이의 제어 방법, 그리고 예측 시스템의 학습 방법

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180082419A1 (en) * 2016-09-20 2018-03-22 International Business Machines Corporation Handprint analysis to predict genetically based traits
US20180150757A1 (en) * 2016-11-29 2018-05-31 International Business Machines Corporation Accurate temporal event predictive modeling
CN109032829A (zh) * 2018-07-23 2018-12-18 腾讯科技(深圳)有限公司 数据异常检测方法、装置、计算机设备及存储介质

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7127087B2 (en) * 2000-03-27 2006-10-24 Microsoft Corporation Pose-invariant face recognition system and process
WO2007115426A2 (en) * 2006-03-30 2007-10-18 Carestream Health, Inc. Smote algorithm with locally linear embedding
JP5142135B2 (ja) 2007-11-13 2013-02-13 インターナショナル・ビジネス・マシーンズ・コーポレーション データを分類する技術
US20130097103A1 (en) 2011-10-14 2013-04-18 International Business Machines Corporation Techniques for Generating Balanced and Class-Independent Training Data From Unlabeled Data Set
JP5733229B2 (ja) 2012-02-06 2015-06-10 新日鐵住金株式会社 分類器作成装置、分類器作成方法、及びコンピュータプログラム
US20180210944A1 (en) 2017-01-26 2018-07-26 Agt International Gmbh Data fusion and classification with imbalanced datasets
CN107239789A (zh) * 2017-05-09 2017-10-10 浙江大学 一种基于k‑means的不平衡数据工业故障分类方法
US11436428B2 (en) * 2017-06-06 2022-09-06 Sightline Innovation Inc. System and method for increasing data quality in a machine learning process
US11735317B2 (en) 2017-08-11 2023-08-22 Vuno, Inc. Method for generating prediction result for predicting occurrence of fatal symptoms of subject in advance and device using same
CN107688831A (zh) * 2017-09-04 2018-02-13 五邑大学 一种基于聚类下采样的不平衡数据分类方法
CN108985369A (zh) * 2018-07-06 2018-12-11 太原理工大学 一种用于非平衡数据集分类的同分布集成预测方法及系统
CN109086412A (zh) * 2018-08-03 2018-12-25 北京邮电大学 一种基于自适应加权Bagging-GBDT的不平衡数据分类方法
CN109492673A (zh) * 2018-10-19 2019-03-19 南京理工大学 一种基于谱聚类采样的不平衡数据预测方法
US11128667B2 (en) * 2018-11-29 2021-09-21 Rapid7, Inc. Cluster detection and elimination in security environments

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180082419A1 (en) * 2016-09-20 2018-03-22 International Business Machines Corporation Handprint analysis to predict genetically based traits
US20180150757A1 (en) * 2016-11-29 2018-05-31 International Business Machines Corporation Accurate temporal event predictive modeling
CN109032829A (zh) * 2018-07-23 2018-12-18 腾讯科技(深圳)有限公司 数据异常检测方法、装置、计算机设备及存储介质

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Expert Systems with Applications 36 (2009) 5718-5727, Yen S-J et Lee Y-S, "Cluster-based under-sampling approaches for imbalanced data distributions", doi:10.1016/j.eswa.2008.06.108 *
Nnamoko NA, "Ensemble-based Supervised Learning for Predicting Diabetes Onset", PhD Thesis, Liverpool John Moores University, July 2017, *

Also Published As

Publication number Publication date
WO2020208445A1 (en) 2020-10-15
US11593716B2 (en) 2023-02-28
GB202115645D0 (en) 2021-12-15
US20200327456A1 (en) 2020-10-15
JP7335352B2 (ja) 2023-08-29
CN113632112A (zh) 2021-11-09
JP2022527366A (ja) 2022-06-01

Similar Documents

Publication Publication Date Title
GB2598061A (en) Enhanced ensemble model diversity and learning
JP2022527366A5 (https=)
US12373683B2 (en) Anomaly detection according to a multi-model analysis
US11640563B2 (en) Automated data processing and machine learning model generation
US11954202B2 (en) Deep learning based detection of malicious shell scripts
US11860721B2 (en) Utilizing automatic labelling, prioritizing, and root cause analysis machine learning models and dependency graphs to determine recommendations for software products
GB2595088A (en) Security systems and methods
US12254388B2 (en) Generation of counterfactual explanations using artificial intelligence and machine learning techniques
US20220027572A1 (en) Systems and methods for generating a summary of a multi-speaker conversation
CN113420073A (zh) 基于改进的孤立森林的异常样本检测方法及相关设备
US11797776B2 (en) Utilizing machine learning models and in-domain and out-of-domain data distribution to predict a causality relationship between events expressed in natural language text
US11863524B2 (en) Autotuning a virtual firewall
US12014140B2 (en) Utilizing machine learning and natural language processing to determine mappings between work items of various tools
GB2595126A (en) Systems and methods for conducting a security recognition task
US11202179B2 (en) Monitoring and analyzing communications across multiple control layers of an operational technology environment
CN113139381A (zh) 不均衡样本分类方法、装置、电子设备及存储介质
US20220180225A1 (en) Determining a counterfactual explanation associated with a group using artificial intelligence and machine learning techniques
CN110198299B (zh) 一种入侵检测方法和装置
US20210158901A1 (en) Utilizing a neural network model and hyperbolic embedded space to predict interactions between genes
US20250200424A1 (en) Dynamic configuration of a data processing system
US12282734B2 (en) Processing and converting delimited data
US20240412047A1 (en) Enhancing the quality of simulated network data using generative adversarial networks
JP2020030752A (ja) 情報処理装置、情報処理方法およびプログラム
WO2022060868A1 (en) An automated machine learning tool for explaining the effects of complex text on predictive results
US12468891B2 (en) Systems and methods for utilizing a machine learning model for sentence boundary detection

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)