JP2017521763A - インスタンス分類方法 - Google Patents
インスタンス分類方法 Download PDFInfo
- Publication number
- JP2017521763A JP2017521763A JP2016571775A JP2016571775A JP2017521763A JP 2017521763 A JP2017521763 A JP 2017521763A JP 2016571775 A JP2016571775 A JP 2016571775A JP 2016571775 A JP2016571775 A JP 2016571775A JP 2017521763 A JP2017521763 A JP 2017521763A
- Authority
- JP
- Japan
- Prior art keywords
- class
- word
- distribution
- document
- probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/93—Document management systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Business, Economics & Management (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Business, Economics & Management (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Algebra (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
記号を単純化するために、
Claims (5)
- 既知のクラスを有する訓練インスタンス(ラベルありデータ)及び未知のクラスを有する0以上の訓練インスタンス(ラベルなしデータ)の集合を用いて、テキスト文書を含む新規インスタンスを分類する方法であって、
前記ラベルあり及びラベルなしデータを用いて、各クラスzに関する単語分布θzを推定する第1のパラメータ学習ステップと、
前記ラベルあり及びラベルなしデータを用いて、背景分布γと、γ及びθz間の補間度δとを推定する第2のパラメータ学習ステップと、
新規インスタンスの各単語に関して、前記単語が前記単語分布θzから又は前記背景分布γから生成される確率を計算することと、δを用いて前記2つの確率を組み合わせることと、得られた全単語の確率を組み合わせて、クラスzから生成される前記文書であることを示す前記クラスzに関する文書確率を推定することとを含み、前記新規インスタンスを前記文書確率が最も高いクラスz*に分類する分類ステップとを含む、方法。 - 前記第2のパラメータ学習ステップにおいて、前記背景分布γと、γ及びθz間の前記補間度δとは、既知及び未知のクラスを有する全ての前記インスタンスの前記集合の観測確率を最大化するように推定される、請求項1に記載の方法。
- 前記背景分布γは、既知及び未知のクラスを有する全インスタンスにおいて観測される単語頻度分布に設定される、請求項1に記載の方法。
- 前記補間パラメータδは、期待文書分類精度を最適化するように設定される、請求項1に記載の方法。
- 各クラスzの前記単語分布θzと、前記背景分布γとは、多項分布又は混合多項分布に設定され、前記ラベルありデータを用いて、又は前記ラベルあり及びラベルなしデータの双方を用いて推定される、請求項1に記載の方法。
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2014/067090 WO2015194052A1 (en) | 2014-06-20 | 2014-06-20 | Feature weighting for naive bayes classifiers using a generative model |
Publications (2)
Publication Number | Publication Date |
---|---|
JP2017521763A true JP2017521763A (ja) | 2017-08-03 |
JP6292322B2 JP6292322B2 (ja) | 2018-03-14 |
Family
ID=54935076
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2016571775A Active JP6292322B2 (ja) | 2014-06-20 | 2014-06-20 | インスタンス分類方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US10324971B2 (ja) |
JP (1) | JP6292322B2 (ja) |
WO (1) | WO2015194052A1 (ja) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10592147B2 (en) * | 2017-07-26 | 2020-03-17 | International Business Machines Corporation | Dataset relevance estimation in storage systems |
WO2020051413A1 (en) * | 2018-09-07 | 2020-03-12 | Walmart Apollo, Llc | Method and apparatus to more quickly classify additional text entries |
CN109597888A (zh) * | 2018-11-19 | 2019-04-09 | 北京百度网讯科技有限公司 | 建立文本领域识别模型的方法、装置 |
CN110147447B (zh) * | 2019-04-25 | 2022-11-18 | 中国地质大学(武汉) | 一种隐多项式朴素贝叶斯文本分类方法及装置 |
US11281999B2 (en) | 2019-05-14 | 2022-03-22 | International Business Machines Corporation Armonk, New York | Predictive accuracy of classifiers using balanced training sets |
CN110196909B (zh) * | 2019-05-14 | 2022-05-31 | 北京来也网络科技有限公司 | 基于强化学习的文本去噪方法及装置 |
US11593569B2 (en) * | 2019-10-11 | 2023-02-28 | Lenovo (Singapore) Pte. Ltd. | Enhanced input for text analytics |
US11594213B2 (en) * | 2020-03-03 | 2023-02-28 | Rovi Guides, Inc. | Systems and methods for interpreting natural language search queries |
US11836189B2 (en) | 2020-03-25 | 2023-12-05 | International Business Machines Corporation | Infer text classifiers for large text collections |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004362584A (ja) * | 2003-06-03 | 2004-12-24 | Microsoft Corp | テキストおよび音声の分類のための言語モデルの判別トレーニング |
JP2010108265A (ja) * | 2008-10-30 | 2010-05-13 | Kddi Corp | コンテンツ分類装置およびプログラム |
US20120078969A1 (en) * | 2010-09-24 | 2012-03-29 | International Business Machines Corporation | System and method to extract models from semi-structured documents |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7937345B2 (en) * | 2006-07-12 | 2011-05-03 | Kofax, Inc. | Data classification methods using machine learning techniques |
US7761391B2 (en) * | 2006-07-12 | 2010-07-20 | Kofax, Inc. | Methods and systems for improved transductive maximum entropy discrimination classification |
US7958067B2 (en) * | 2006-07-12 | 2011-06-07 | Kofax, Inc. | Data classification methods using machine learning techniques |
US20100153318A1 (en) * | 2008-11-19 | 2010-06-17 | Massachusetts Institute Of Technology | Methods and systems for automatically summarizing semantic properties from documents with freeform textual annotations |
US20110119050A1 (en) * | 2009-11-18 | 2011-05-19 | Koen Deschacht | Method for the automatic determination of context-dependent hidden word distributions |
US9519868B2 (en) * | 2012-06-21 | 2016-12-13 | Microsoft Technology Licensing, Llc | Semi-supervised random decision forests for machine learning using mahalanobis distance to identify geodesic paths |
US9373087B2 (en) * | 2012-10-25 | 2016-06-21 | Microsoft Technology Licensing, Llc | Decision tree training in machine learning |
-
2014
- 2014-06-20 US US15/318,853 patent/US10324971B2/en active Active
- 2014-06-20 JP JP2016571775A patent/JP6292322B2/ja active Active
- 2014-06-20 WO PCT/JP2014/067090 patent/WO2015194052A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004362584A (ja) * | 2003-06-03 | 2004-12-24 | Microsoft Corp | テキストおよび音声の分類のための言語モデルの判別トレーニング |
JP2010108265A (ja) * | 2008-10-30 | 2010-05-13 | Kddi Corp | コンテンツ分類装置およびプログラム |
US20120078969A1 (en) * | 2010-09-24 | 2012-03-29 | International Business Machines Corporation | System and method to extract models from semi-structured documents |
Non-Patent Citations (2)
Title |
---|
古宮 嘉那子: "文書分類のためのNegation Naive Bayes", 自然言語処理, vol. 第20巻 第2号, JPN6017047059, 14 June 2013 (2013-06-14), JP, pages 161 - 182, ISSN: 0003698046 * |
藤野 昭典: "ラベルあり・なしデータの最適な結合に基づくパターン分類", 電子情報通信学会技術研究報告, vol. 104, no. 669, JPN6017047057, 17 February 2005 (2005-02-17), JP, pages 19 - 24, ISSN: 0003698045 * |
Also Published As
Publication number | Publication date |
---|---|
JP6292322B2 (ja) | 2018-03-14 |
WO2015194052A1 (en) | 2015-12-23 |
US20170116332A1 (en) | 2017-04-27 |
US10324971B2 (en) | 2019-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6292322B2 (ja) | インスタンス分類方法 | |
Tsuboi et al. | Direct density ratio estimation for large-scale covariate shift adaptation | |
US8332334B2 (en) | System and method for cross domain learning for data augmentation | |
US8589317B2 (en) | Human-assisted training of automated classifiers | |
Klinkenberg | Learning drifting concepts: Example selection vs. example weighting | |
Haque et al. | Fusion: An online method for multistream classification | |
US10606910B2 (en) | Ranking search results using machine learning based models | |
US11636682B2 (en) | Embedding contextual information in an image to assist understanding | |
US11947633B2 (en) | Oversampling for imbalanced test data | |
US20210089870A1 (en) | Data valuation using reinforcement learning | |
US9582758B2 (en) | Data classification method, storage medium, and classification device | |
KR20220054410A (ko) | 국부적으로 해석 가능한 모델에 기반한 강화 학습 | |
US20180137421A1 (en) | Information processing apparatus, information processing method, and non-transitory computer readable storage medium | |
JP6230987B2 (ja) | 言語モデル作成装置、言語モデル作成方法、プログラム、および記録媒体 | |
US11977602B2 (en) | Domain generalized margin via meta-learning for deep face recognition | |
US9053434B2 (en) | Determining an obverse weight | |
Nguyen et al. | Mutual information estimation for filter based feature selection using particle swarm optimization | |
JP2015038709A (ja) | モデルパラメータ推定方法、装置、及びプログラム | |
JP5398811B2 (ja) | 文書分類装置及び方法及びプログラム | |
Dahinden et al. | Decomposition and model selection for large contingency tables | |
Kim et al. | Overfitting, generalization, and MSE in class probability estimation with high‐dimensional data | |
Renuka et al. | An ensembled classifier for email spam classification in hadoop environment | |
JP7331938B2 (ja) | 学習装置、推定装置、学習方法及び学習プログラム | |
JP7047664B2 (ja) | 学習装置、学習方法および予測システム | |
US20240193372A1 (en) | Building Bots from Raw Logs and Computing Coverage of Business Logic Graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20171212 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20171226 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20180116 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20180129 |
|
R150 | Certificate of patent or registration of utility model |
Ref document number: 6292322 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |