WO2016178243A1 - Multi class classifier from single class dataset - Google Patents

Multi class classifier from single class dataset Download PDF

Info

Publication number
WO2016178243A1
WO2016178243A1 PCT/IN2016/000116 IN2016000116W WO2016178243A1 WO 2016178243 A1 WO2016178243 A1 WO 2016178243A1 IN 2016000116 W IN2016000116 W IN 2016000116W WO 2016178243 A1 WO2016178243 A1 WO 2016178243A1
Authority
WO
WIPO (PCT)
Prior art keywords
class
classifier
dataset
multi class
records
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/IN2016/000116
Other languages
English (en)
French (fr)
Inventor
Bhavin Manharlal Shah
Bhushan Harshadrai Trivedi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of WO2016178243A1 publication Critical patent/WO2016178243A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes

Definitions

  • single class instance or data item or record or input or any object means that the instance or data item or record or input or any object is a member of any one of the multiple classes. Collection of such single class instance or data item or record or input or any . object is referred as single class dataset. Further, in this invention, multi class instance or data item or record or input or any object means that the instance or data item or record or input or any object is a member of multiple classes. Collection of such multi class instance or data item or record or input or any object is referred as multi class dataset in this invention.
  • single class classifier is defined as the classifier which classifies given input as a member of one the multiple classes.
  • multi class classifier is a classifier which classifies the given input as a member of one or multiple classes.
  • This invention relates to methods and systems of multi class classifier where the multi class classifier is trained and tested by generating multi class dataset from single class dataset and such trained multi class classifier automatically classifies instance or data item or record or any object as member of one or multiple classes.
  • this invention can also be applied for multi class categorization of given instance or data item or record or any object.
  • Multi class classifier of this invention can be used for various purpose like : classification of network traffic as member of normal or one or multiple attack classes, for example classification of single instance of network traffic in to DOS and PROB attack classes; classification of a webpage as member of one or multiple classes, for example classification of a webpage as member of Education and Game classes; classification of a user or human or animal as member of one or multiple classes, for example classification of a user as member of Romantic and Emotional classes; classification of text as member of one or multiple classes, for example classification of a word as member of Indian Word and German Word classes; classification of a health record as member of one or multiple classes, for example classification of a health record as member of Diabetic and Asthma classes; classification of a email as member of one or multiple classes, for example classification of a email as member of Office and Education classes or in any method or system which requires the classification of a input as member of one or multiple classes where the multi class dataset for the training and testing purpose is not available.
  • pattern No: CN 102722726 B with title multi-class support vector machine classification method based on dynamic binary tree and Paten No: US6816456 with title methods and apparatus for network use optimization which claims classification of the given input pattern as member of one of the available multiple classes.
  • Above inventions are not be able to classify the given object as member of multiple classes and therefore they are not addressing multi class classifier in true sense.
  • patent No. US 7974994 B2 with title sensitive webpage content detection disclosed a multi-class classifier.
  • contents of web page are analyzed with the multi- class classifier and webpage is categorized as member of one or multiple sensitivity categories.
  • words or phrases are extracted which are fed in to the classifier.
  • single class having highest probability value is selected.
  • webpage, having more than one word is classified in to multiple classes due to classification of each individual word in to one class.
  • the classifier of above invention is not multi class classifier in true sense which is the major limitation of above invention.
  • Patent No: US20100014762 with title categorizer with user-controllable calibration classifies the given object as member of two or more classes.
  • above invention uses user calibration for such classification. Classifying large amount of data like network traffic using user calibration is not sensible.
  • Patent NO: US006823323B2 with title Automatic classification method and apparatus which is very close to this invention classifies the record as member of one or multiple classes using ballpark classier.
  • above invention uses records which itself belongs to multiple classes. If such multi class records are not available then above invention is of no use. Further, to properly train such classifier, sufficient number of records must be fed. Sufficient number of records depends upon the classifier, number of inputs and their type. Having sufficient multi class records or even having multi class record is very difficult.
  • FIG. 1 is a flow diagram showing the general flow of multi class classifier that uses single class dataset and generates multi class dataset which is used to train and test the multi class classifier;
  • FIG. 2(A) is a flow diagram illustrating an exemplary method or system for multi class dataset generation through single class dataset
  • FIG. 2(B) is a flow diagram illustrating generation of multi class training and testing dataset from multi class dataset.
  • FIG. 3(A) is a flow diagram illustrating an exemplary method or system for working of ordinary classifier having single output unit which classifies given input in to single class from available multiple classes;
  • FIG. 3(B) is a diagram illustrating an exemplary method or system for mapping of single output value of ordinary classifier in to single class from available multiple classes; '
  • FIG. 4(A) is a flow diagram illustrating an exemplary method for working of multi class classifier with single output unit
  • FIG. 4(B) is a diagram illustrating an exemplary method or system for mapping of single output value of multi class classifier as member of one or multiple classes
  • FIG. 5 is a flow diagram illustrating an exemplary method or system for working of ordinary classifier with multiple output units which classifies the given input in to a particular class by selecting or activating corresponding output unit;
  • FIG. 6 is a flow diagram illustrating an exemplary method or system for working of multi class classifier with multiple output units which classifies the given input as member of one or multiple classes.
  • FIG. 7 is a flow diagram illustrating an exemplary method or system for working of ordinary classifier with multiple output units which classifies the given input in to a particular class by using binary pattern generated by output units.
  • FIG. 8 is a flow diagram illustrating an exemplary method or system for working of multi class classifier with multiple output units which classifies the given input as member of one or multiple classes by using binary pattern generated by output units.
  • FIG. 1 is a flow diagram showing the general flow of multi class classifier which has been trained by single class dataset (101).
  • This single class dataset (101) may be in any format which stores data or records or objects. Examples of such formats are text format, CSV format, database format, or folder format which contains objects and their class label.
  • this single class dataset (101) is used by multi class dataset generation (102) module to generate multi class dataset (103).
  • FIG. 2(A) Detailed process of generation of multi class dataset from single class dataset
  • FIG. 2(B) shows Sample copy of single class dataset (101) and generated multi class dataset (103) are shown in Table 1 and Table 4 respectively at letter part of this document.
  • multi class training dataset (104) and multi class testing dataset (105) which might be pre-proposed or directly used by multi class classifier (107) for respectively training and testing of behavior layer (106).
  • finalized behavior layer (106) is used by the multi class classifier (107) for the real-time or offline classification of input data (108) as member of one or multiple classes.
  • multi class classifier with three classes, Class A (109), Class B (110), and Class C (111) has been shown. Classifier shown in FIG.
  • Class 1 is able to classify the given single instance or data item or record or object in to the Class A (109) only; Class B (110) only; Class C (11 1) only; Class A and Class B, that is Class AB (112); Class A and Class C, that is Class AC (113); Class B and Class C, that is Class BC (1 14); Class A, Class B and Class C, that is Class ABC (1 15).
  • Class ⁇ Label 1 ⁇ ⁇ Label 2 ⁇ .... ⁇ Label N ⁇ .
  • Class AB means combination of Class A as well as Class B. '
  • multi class classifier (107) For classification of input into one or more classes, multi class classifier (107) is design and trained in such a way that it gives output or outputs which can be mapped to one or more classes. Detailed methods or systems for mapping of output or outputs as member of one or multiple classes are shown in FIG. 4, FIG. 6 and FIG. 8 and discussed in more details at letter part of this document. On the basis of the number of output units used, multi class classifier presented in FIG. 4, FIG. 6 and FIG. 8, are broadly categorized in classifier with single output unit and classifier with multiple output units. Multi class classifier shown in FIG. 4 uses single output unit whose output value is mapped to one or more classes, while classifier shown in FIG. 6 and FIG. 8 uses multiple output units.
  • FIG. 2(A) shows step by step process or method of generating multi class dataset (206) from the single class dataset (201).
  • Sample copy of single class dataset and multi class dataset are shown in Table 1 and Table 4 respectively.
  • normal values of Attribute -1, Attribute -2, Attribute-3 and Attribute-4 are taken as 0.1, 0.2, 0.3 and 0.4 respectively.
  • association technique (202) on single class dataset (201) and discovers feature to class association (203) and generates rules based on feature's value that decide the class labels (204). Any technique which generates such rules (204) by mapping the attribute and their values to a specific class can be used as association technique (202).
  • Sample copy of rules (204) generated by applying such association technique (202) on single class dataset available in Table 1 is shown in Table 2.
  • Table 2 Sample Copy of Rules Generated From Single Class Dataset of Table-1 by
  • the Rule 1 "If (Attribute- 1 ⁇ 0.3 to 0.8) then Class A" means that attribute 1 with value [0.3 to 0.8] leads to the classification of record in to the class A.
  • Rule -2 means, that attribute-2 with value [0.5] and attribute-3 with value [0.4 to 0.6] collectively classify the given record in to the class A.
  • attribute-3 with value [0.7 to 0.9] classify the given record in to class B.
  • other rules are generated. Number of the rules generated in the Table 2 and their constrains mainly depends upon: number of records fed in association technique; number of attributes contained by the dataset; and verities of the records available in the dataset.
  • System or method presented in FIG. 2(A) and (B) can also be used to generate additional multi class records from already generated multi class records by replacing single class dataset (201) with generated multi class dataset.
  • preexisting multi class dataset can also be used instead of single class dataset (201).
  • any combination of preexisting single class dataset, preexisting multi class dataset and generated multi class dataset can be used instead of single class dataset (201 ) in FIG. 2(A).
  • Self explanatory detailed process of generation of multi class record from preexisting multi class dataset or generated multi class dataset is shown in Table 5.
  • Table 6 shows self explanatory detailed process for generation of multi class dataset from single class dataset and multi class dataset which may be preexisting or generated.
  • single class dataset (201) can be replaced by: preexisting multi class dataset; or generated multi class dataset; or combination of preexisting single class and preexisting multi class dataset; or combination of preexisting single class and generated multi class dataset; or combination of preexisting multi class and generated multi class dataset; or combination of preexisting single class, preexisting multi class and generated multi class dataset.
  • classifier may use single output unit or multiple output units.
  • Classifier with single output unit is shown in FIG. 3(A) and FIG. 4(A) while classifier with multiple output units is shown in FIG. 5, FIG. 6, FIG. 7 and FIG. 8.
  • FIG. 3(A) shows basic working of the ordinary classifier having single output unit which classifies the given input data in to one of the available classes.
  • classifier (304) uses single class training data (301) and single class testing data (302) and generates behavior layer (305) which is used for classification of input data (303). Classifier (304) classifies the input data (303) in to any one of the available multiple classes. For example in the FIG.
  • classifier (304) classifies input in to any one of three classes namely Class A (306) or Class B (307) or Class C (308).
  • Method for mapping of output value of classifier with single output unit (304) in to one of the available multiple classes is presented in FIG. 3(B).
  • output range is divided in to sub range as per the available classes. For example, if number of the classes are three then sub ranges could be 0.0 (309) to 0.30 (311), 0.35 (312) to 0.65(314), 0.7(315) to 1.0(317).
  • Gap between two classes for example, gap between Class A (310) and Class B (313) which is 0.31 to 0.34, is used for overcoming the class overlapping.
  • FIG. 3(A) and FIG. 3(B) classify the given input in to one of the available classes. For example, as per FIG. 3(A), for the given input data (303), if classifier (304) gives output as 0.2 then as per FIG. 3(B), 0.2 output value is mapped to Class A and hence, given input is classified as member of Class A.
  • FIG. 4(A) shows architecture of multi class classifier having single output unit which classifies the give record as member of one or multiple classes.
  • the classifier uses the multi class training dataset (401) and multi class testing dataset (402) which can be generated as per FIG. 2(A) and FIG. 2(B).
  • the classifier classifies the input data (403) as member of one or multiple classes.
  • classifier classifies input data either in to the Class A(406) or Class B(407) or Class C(408) or Class AB(409) or Class AC(410) or Class BC(411) or Class ABC(412).
  • FIG. 4(B) is used which is very similar to approach shown in FIG. 3(B). As per FIG. 4(B), output range is divided in to the 7 sub ranges having gap between each range. This ranges and gaps are only for the illustration purpose and hence can be varied as per the requirement.
  • FIG. 4(A) and FIG. 4(B) classify the given input as member of one or multiple classes. For example, as per FIG. 4(A), for the given input data (403), if classifier (404) gives output as 0.89 then as per FIG. 4(B), 0.89 output value is mapped to Class ABC (432) and hence, given input is classified as member of Class ABC.
  • FIG. 5 shows working of the single class classifier with multiple output units.
  • the classifier (504) uses training data (501) and testing data (502) and generates behavior layer (505). Using this behavior layer (505), the classifier (504) classifies the given input data (503) in to the one of the available multiple classes. For example in FIG. 5, classifier classifies the input in to either Class A (506) or Class B (507) or Class C (508). As per the example presented in the FIG. 5, classifier uses three output units for the classification. Particular class is selected if respective output unit is activated. Output value 100 indicates Class A (506), 010 indicates Class B (507) and 001 indicates Class C (508).
  • Architecture presented in FIG. 5, can be extended in to multi class classifier by providing multi class training and testing data and by training the classifier by allowing to activate one or more output units. Such architecture is shown in FIG. 6.
  • FIG. 6 shows working of multi class classifier having multiple output units.
  • classifier (604) uses multi class training data (601) and multi class testing data (602) which can be generated as per FIG. 2(A) and FIG. 2(B).
  • Such multi class training data (601) and multi class testing data (602) is used to generate behavior layer (605).
  • Such trained classifier (604) classifies the given input data (603) as member of one or multiple classes.
  • Particular class or classes are selected as per the activation of respective output units. For example, in the FIG. 6, for the case of three classes, three output units are used for multi class classification.
  • FIG. 7 is an ordinary classifier which classifies the input data (703) in to one of the available multiple classes by activation of output unit in binary pattern.
  • two output units are used for classification of input data (703) in to any one of three classes, i.e. Class A (706), Class B (707) and Class C (708).
  • Binary pattern generated by the output units helps to classify the input data (703) in to the three classes mentioned above. For example, as per FIG. 7, if output pattern generated is 01 then Class B (707) is selected.
  • Architecture presented in FIG. 7, can also be extended in to multi class classifier which is shown in FIG. 8.
  • FIG. 8 shows working of multi class classifier that classifies given input as member of one or multiple classes using binary output pattern.
  • classifier (804) uses multi class training dataset (801) and multi class testing dataset (802). These multi class training and testing dataset can be generated as per FIG. 2(A) and FIG. 2(B). Such multi class training data (801) and multi class testing data (802) are used to generate behavior layer (805).
  • Such trained classifier (804) classifies the given input data (803) in to the multiple classes. Particular class or classes are selected as per the binary pattern generated by the output units. For example, in the FIG. 8, three output units are used for multi class classification.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)
PCT/IN2016/000116 2015-05-05 2016-05-04 Multi class classifier from single class dataset Ceased WO2016178243A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN1794/MUM/2015 2015-05-05
IN1794MU2015 IN2015MU01794A (enExample) 2015-05-05 2016-05-04

Publications (1)

Publication Number Publication Date
WO2016178243A1 true WO2016178243A1 (en) 2016-11-10

Family

ID=54394887

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IN2016/000116 Ceased WO2016178243A1 (en) 2015-05-05 2016-05-04 Multi class classifier from single class dataset

Country Status (2)

Country Link
IN (1) IN2015MU01794A (enExample)
WO (1) WO2016178243A1 (enExample)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160335521A1 (en) * 2015-05-14 2016-11-17 Canon Kabushiki Kaisha Method and apparatus for generating, updating classifier, detecting objects and image processing device
WO2020243333A1 (en) * 2019-05-30 2020-12-03 The Research Foundation For The State University Of New York System, method, and computer-accessible medium for generating multi-class models from single-class datasets
CN113826116A (zh) * 2019-05-15 2021-12-21 北京嘀嘀无限科技发展有限公司 用于多类分类的对抗性多二元神经网络

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7769228B2 (en) * 2004-05-10 2010-08-03 Siemens Corporation Method for combining boosted classifiers for efficient multi-class object detection
US20110302111A1 (en) * 2010-06-03 2011-12-08 Xerox Corporation Multi-label classification using a learned combination of base classifiers

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7769228B2 (en) * 2004-05-10 2010-08-03 Siemens Corporation Method for combining boosted classifiers for efficient multi-class object detection
US20110302111A1 (en) * 2010-06-03 2011-12-08 Xerox Corporation Multi-label classification using a learned combination of base classifiers

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160335521A1 (en) * 2015-05-14 2016-11-17 Canon Kabushiki Kaisha Method and apparatus for generating, updating classifier, detecting objects and image processing device
US10242295B2 (en) * 2015-05-14 2019-03-26 Canon Kabushiki Kaisha Method and apparatus for generating, updating classifier, detecting objects and image processing device
CN113826116A (zh) * 2019-05-15 2021-12-21 北京嘀嘀无限科技发展有限公司 用于多类分类的对抗性多二元神经网络
WO2020243333A1 (en) * 2019-05-30 2020-12-03 The Research Foundation For The State University Of New York System, method, and computer-accessible medium for generating multi-class models from single-class datasets

Also Published As

Publication number Publication date
IN2015MU01794A (enExample) 2015-06-19

Similar Documents

Publication Publication Date Title
Yousaf et al. Emotion recognition by textual tweets classification using voting classifier (LR-SGD)
US7788087B2 (en) System for processing sentiment-bearing text
US7788086B2 (en) Method and apparatus for processing sentiment-bearing text
CN102682124B (zh) 一种文本的情感分类方法及装置
US10216838B1 (en) Generating and applying data extraction templates
US8676730B2 (en) Sentiment classifiers based on feature extraction
Firmino Alves et al. A Comparison of SVM versus naive-bayes techniques for sentiment analysis in tweets: A case study with the 2013 FIFA confederations cup
Vosoughi et al. Enhanced twitter sentiment classification using contextual information
US8788503B1 (en) Content identification
CN106445919A (zh) 一种情感分类方法及装置
CN112084376B (zh) 基于图谱知识的推荐方法、推荐系统及电子装置
US20200257762A1 (en) Text classification and sentimentization with visualization
Kucher et al. Text visualization browser: A visual survey of text visualization techniques
US11714963B2 (en) Content modification using natural language processing to include features of interest to various groups
CN103995853A (zh) 基于关键句的多语言情感数据处理分类方法及系统
Shuhidan et al. Sentiment analysis for financial news headlines using machine learning algorithm
Hecking et al. Can topic models be used in research evaluations? Reproducibility, validity, and reliability when compared with semantic maps
Alves et al. A spatial and temporal sentiment analysis approach applied to Twitter microtexts
WO2016178243A1 (en) Multi class classifier from single class dataset
Manzoor et al. Social mining for sustainable cities: thematic study of gender-based violence coverage in news articles and domestic violence in relation to COVID-19
CN107330076A (zh) 一种网络舆情信息展示系统及方法
Saleiro et al. Popstar at replab 2013: Name ambiguity resolution on twitter
Ben Ismail et al. Insult detection in social network comments using possibilistic based fusion approach
CN103345525B (zh) 文本分类方法、装置及处理器
Purchase et al. A classification of infographics

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16789429

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16789429

Country of ref document: EP

Kind code of ref document: A1