WO2006036682A3 - Method and apparatus for efficient training of support vector machines - Google Patents

Method and apparatus for efficient training of support vector machines Download PDF

Info

Publication number
WO2006036682A3
WO2006036682A3 PCT/US2005/033779 US2005033779W WO2006036682A3 WO 2006036682 A3 WO2006036682 A3 WO 2006036682A3 US 2005033779 W US2005033779 W US 2005033779W WO 2006036682 A3 WO2006036682 A3 WO 2006036682A3
Authority
WO
WIPO (PCT)
Prior art keywords
classifier
documents
vector
training
boundary
Prior art date
Application number
PCT/US2005/033779
Other languages
French (fr)
Other versions
WO2006036682A2 (en
Inventor
Keerthi Sathiya Selvaraj
Dennis Decoste
Original Assignee
Overture Services Inc
Keerthi Sathiya Selvaraj
Dennis Decoste
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Overture Services Inc, Keerthi Sathiya Selvaraj, Dennis Decoste filed Critical Overture Services Inc
Publication of WO2006036682A2 publication Critical patent/WO2006036682A2/en
Publication of WO2006036682A3 publication Critical patent/WO2006036682A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99936Pattern matching access
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99937Sorting

Abstract

The present invention provides a system and method for building fast and efficient support vector classifiers for large data classification problems which is useful for classifying pages from the World Wide Web and other problems with sparse matrices and large numbers of documents. The method takes advantage of the least squares nature of such problems, employs exact line search in its iterative process and makes use of a conjugate gradient method appropriate to the problem. In one embodiment a support vector classifier useful for classifying a plurality of documents, including textual documents, is built by selecting a plurality of training documents, each training document having suitable numeric attributes which are associated with a training document vector, then initializing a classifier weight vector and a classifier intercept for a classifier boundary, the classifier boundary separating at least two document classes, then determining which training document vectors are suitable support vectors, and then re-computing the classifier weight vector and the classifier intercept for the classifier boundary using the suitable support vectors together with an iteratively reindexed least squares method and a conjugate gradient method with a stopping criterion.
PCT/US2005/033779 2004-09-24 2005-09-20 Method and apparatus for efficient training of support vector machines WO2006036682A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/949,821 US7440944B2 (en) 2004-09-24 2004-09-24 Method and apparatus for efficient training of support vector machines
US10/949,821 2004-09-24

Publications (2)

Publication Number Publication Date
WO2006036682A2 WO2006036682A2 (en) 2006-04-06
WO2006036682A3 true WO2006036682A3 (en) 2006-10-26

Family

ID=36119404

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2005/033779 WO2006036682A2 (en) 2004-09-24 2005-09-20 Method and apparatus for efficient training of support vector machines

Country Status (2)

Country Link
US (1) US7440944B2 (en)
WO (1) WO2006036682A2 (en)

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7426498B2 (en) * 2004-07-27 2008-09-16 International Business Machines Corporation Method and apparatus for autonomous classification
US20060218110A1 (en) * 2005-03-28 2006-09-28 Simske Steven J Method for deploying additional classifiers
US7685080B2 (en) * 2005-09-28 2010-03-23 Honda Motor Co., Ltd. Regularized least squares classification or regression with leave-one-out (LOO) error
US7986827B2 (en) * 2006-02-07 2011-07-26 Siemens Medical Solutions Usa, Inc. System and method for multiple instance learning for computer aided detection
US7336145B1 (en) * 2006-11-15 2008-02-26 Siemens Aktiengesellschaft Method for designing RF excitation pulses in magnetic resonance tomography
US20080201634A1 (en) * 2007-02-20 2008-08-21 Gibb Erik W System and method for customizing a user interface
US7636715B2 (en) * 2007-03-23 2009-12-22 Microsoft Corporation Method for fast large scale data mining using logistic regression
US7979367B2 (en) * 2007-03-27 2011-07-12 Nec Laboratories America, Inc. Generalized sequential minimal optimization for SVM+ computations
US8856123B1 (en) * 2007-07-20 2014-10-07 Hewlett-Packard Development Company, L.P. Document classification
US20090055436A1 (en) * 2007-08-20 2009-02-26 Olakunle Olaniyi Ayeni System and Method for Integrating on Demand/Pull and Push Flow of Goods-and-Services Meta-Data, Including Coupon and Advertising, with Mobile and Wireless Applications
US7933847B2 (en) * 2007-10-17 2011-04-26 Microsoft Corporation Limited-memory quasi-newton optimization algorithm for L1-regularized objectives
WO2009052265A1 (en) * 2007-10-19 2009-04-23 Huron Consulting Group, Inc. Document review system and method
US7958065B2 (en) 2008-03-18 2011-06-07 International Business Machines Corporation Resilient classifier for rule-based system
US8280829B2 (en) * 2009-07-16 2012-10-02 Yahoo! Inc. Efficient algorithm for pairwise preference learning
US8438009B2 (en) * 2009-10-22 2013-05-07 National Research Council Of Canada Text categorization based on co-classification learning from multilingual corpora
US8271408B2 (en) * 2009-10-22 2012-09-18 Yahoo! Inc. Pairwise ranking-based classifier
US8868402B2 (en) * 2009-12-30 2014-10-21 Google Inc. Construction of text classifiers
EP2369505A1 (en) * 2010-03-26 2011-09-28 British Telecommunications public limited company Text classifier system
JP5640774B2 (en) 2011-01-28 2014-12-17 富士通株式会社 Information collation apparatus, information collation method, and information collation program
US8566321B2 (en) * 2011-03-11 2013-10-22 Amco Llc Relativistic concept measuring system for data clustering
US9122681B2 (en) 2013-03-15 2015-09-01 Gordon Villy Cormack Systems and methods for classifying electronic information using advanced active learning techniques
CN104765728B (en) * 2014-01-08 2017-07-18 富士通株式会社 The method trained the method and apparatus of neutral net and determine sparse features vector
CN104298729B (en) * 2014-09-28 2018-02-23 小米科技有限责任公司 Data classification method and device
CN104616029B (en) * 2014-12-29 2017-11-03 小米科技有限责任公司 Data classification method and device
US10671675B2 (en) 2015-06-19 2020-06-02 Gordon V. Cormack Systems and methods for a scalable continuous active learning approach to information classification
US11681942B2 (en) 2016-10-27 2023-06-20 Dropbox, Inc. Providing intelligent file name suggestions
US9852377B1 (en) * 2016-11-10 2017-12-26 Dropbox, Inc. Providing intelligent storage location suggestions
US11495086B2 (en) * 2016-12-28 2022-11-08 Microsoft Technology Licensing, Llc Detecting cheating in games with machine learning
US10579905B2 (en) * 2017-03-17 2020-03-03 Google Llc Fully parallel, low complexity approach to solving computer vision problems
RU2678716C1 (en) * 2017-12-11 2019-01-31 Общество с ограниченной ответственностью "Аби Продакшн" Use of autoencoders for learning text classifiers in natural language
US11392803B2 (en) 2019-06-04 2022-07-19 International Business Machines Corporation Decision boundary enhancement for learning models
CN116580029B (en) * 2023-07-12 2023-10-13 浙江海威汽车零件有限公司 Quality inspection control system and method for aluminum alloy casting finished product

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6192360B1 (en) * 1998-06-23 2001-02-20 Microsoft Corporation Methods and apparatus for classifying text and for building a text classifier
US20030028541A1 (en) * 2001-06-07 2003-02-06 Microsoft Corporation Method of reducing dimensionality of a set of attributes used to characterize a sparse data set
US20030167267A1 (en) * 2002-03-01 2003-09-04 Takahiko Kawatani Document classification method and apparatus

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5687364A (en) * 1994-09-16 1997-11-11 Xerox Corporation Method for learning to infer the topical content of documents based upon their lexical content

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6192360B1 (en) * 1998-06-23 2001-02-20 Microsoft Corporation Methods and apparatus for classifying text and for building a text classifier
US20030028541A1 (en) * 2001-06-07 2003-02-06 Microsoft Corporation Method of reducing dimensionality of a set of attributes used to characterize a sparse data set
US20030167267A1 (en) * 2002-03-01 2003-09-04 Takahiko Kawatani Document classification method and apparatus

Also Published As

Publication number Publication date
US7440944B2 (en) 2008-10-21
US20060074908A1 (en) 2006-04-06
WO2006036682A2 (en) 2006-04-06

Similar Documents

Publication Publication Date Title
WO2006036682A3 (en) Method and apparatus for efficient training of support vector machines
CN104699772B (en) A kind of big data file classification method based on cloud computing
JP2005158010A5 (en)
CN106202124A (en) Web page classification method and device
ATE466343T1 (en) METHOD FOR ADJUSTING A K-FOLD TEXT PARTITION TO INCOMING DATA
JP2016534709A5 (en)
US8572087B1 (en) Content identification
WO2012075884A1 (en) Bookmark intelligent classification method and server
Granell et al. Hierarchical multiresolution method to overcome the resolution limit in complex networks
CN104766098A (en) Construction method for classifier
CN107273500A (en) Text classifier generation method, file classification method, device and computer equipment
CN102523202A (en) Deep learning intelligent detection method for fishing webpages
CN103886077B (en) Short text clustering method and system
CN101251896B (en) Object detecting system and method based on multiple classifiers
Ziaratban et al. Language-based feature extraction using template-matching in Farsi/Arabic handwritten numeral recognition
CN104731884B (en) A kind of querying method of more Hash tables based on multi-feature fusion
CN104462229A (en) Event classification method and device
CN102567529B (en) Cross-language text classification method based on two-view active learning technology
CN114548363A (en) Unmanned vehicle carried camera target detection method based on YOLOv5
CN105956002A (en) Webpage classification method and device based on URL analysis
KR101158750B1 (en) Text classification device and classification method thereof
JP2011128924A (en) Comic image analysis apparatus, program, and search apparatus and method for extracting text from comic image
CN110121723B (en) Artificial neural network
CN105069133B (en) A kind of digital picture sorting technique based on Unlabeled data
Shaw et al. Enhancing an incremental clustering algorithm for web page collections

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV LY MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 05799637

Country of ref document: EP

Kind code of ref document: A2