WO2016178243A1 - Multi class classifier from single class dataset - Google Patents

Multi class classifier from single class dataset Download PDF

Info

Publication number
WO2016178243A1
WO2016178243A1 PCT/IN2016/000116 IN2016000116W WO2016178243A1 WO 2016178243 A1 WO2016178243 A1 WO 2016178243A1 IN 2016000116 W IN2016000116 W IN 2016000116W WO 2016178243 A1 WO2016178243 A1 WO 2016178243A1
Authority
WO
WIPO (PCT)
Prior art keywords
class
classifier
dataset
multi class
records
Prior art date
Application number
PCT/IN2016/000116
Other languages
French (fr)
Inventor
Bhavin Manharlal Shah
Bhushan Harshadrai Trivedi
Original Assignee
Bhavin Manharlal Shah
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bhavin Manharlal Shah filed Critical Bhavin Manharlal Shah
Publication of WO2016178243A1 publication Critical patent/WO2016178243A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes

Definitions

  • single class instance or data item or record or input or any object means that the instance or data item or record or input or any object is a member of any one of the multiple classes. Collection of such single class instance or data item or record or input or any . object is referred as single class dataset. Further, in this invention, multi class instance or data item or record or input or any object means that the instance or data item or record or input or any object is a member of multiple classes. Collection of such multi class instance or data item or record or input or any object is referred as multi class dataset in this invention.
  • single class classifier is defined as the classifier which classifies given input as a member of one the multiple classes.
  • multi class classifier is a classifier which classifies the given input as a member of one or multiple classes.
  • This invention relates to methods and systems of multi class classifier where the multi class classifier is trained and tested by generating multi class dataset from single class dataset and such trained multi class classifier automatically classifies instance or data item or record or any object as member of one or multiple classes.
  • this invention can also be applied for multi class categorization of given instance or data item or record or any object.
  • Multi class classifier of this invention can be used for various purpose like : classification of network traffic as member of normal or one or multiple attack classes, for example classification of single instance of network traffic in to DOS and PROB attack classes; classification of a webpage as member of one or multiple classes, for example classification of a webpage as member of Education and Game classes; classification of a user or human or animal as member of one or multiple classes, for example classification of a user as member of Romantic and Emotional classes; classification of text as member of one or multiple classes, for example classification of a word as member of Indian Word and German Word classes; classification of a health record as member of one or multiple classes, for example classification of a health record as member of Diabetic and Asthma classes; classification of a email as member of one or multiple classes, for example classification of a email as member of Office and Education classes or in any method or system which requires the classification of a input as member of one or multiple classes where the multi class dataset for the training and testing purpose is not available.
  • pattern No: CN 102722726 B with title multi-class support vector machine classification method based on dynamic binary tree and Paten No: US6816456 with title methods and apparatus for network use optimization which claims classification of the given input pattern as member of one of the available multiple classes.
  • Above inventions are not be able to classify the given object as member of multiple classes and therefore they are not addressing multi class classifier in true sense.
  • patent No. US 7974994 B2 with title sensitive webpage content detection disclosed a multi-class classifier.
  • contents of web page are analyzed with the multi- class classifier and webpage is categorized as member of one or multiple sensitivity categories.
  • words or phrases are extracted which are fed in to the classifier.
  • single class having highest probability value is selected.
  • webpage, having more than one word is classified in to multiple classes due to classification of each individual word in to one class.
  • the classifier of above invention is not multi class classifier in true sense which is the major limitation of above invention.
  • Patent No: US20100014762 with title categorizer with user-controllable calibration classifies the given object as member of two or more classes.
  • above invention uses user calibration for such classification. Classifying large amount of data like network traffic using user calibration is not sensible.
  • Patent NO: US006823323B2 with title Automatic classification method and apparatus which is very close to this invention classifies the record as member of one or multiple classes using ballpark classier.
  • above invention uses records which itself belongs to multiple classes. If such multi class records are not available then above invention is of no use. Further, to properly train such classifier, sufficient number of records must be fed. Sufficient number of records depends upon the classifier, number of inputs and their type. Having sufficient multi class records or even having multi class record is very difficult.
  • FIG. 1 is a flow diagram showing the general flow of multi class classifier that uses single class dataset and generates multi class dataset which is used to train and test the multi class classifier;
  • FIG. 2(A) is a flow diagram illustrating an exemplary method or system for multi class dataset generation through single class dataset
  • FIG. 2(B) is a flow diagram illustrating generation of multi class training and testing dataset from multi class dataset.
  • FIG. 3(A) is a flow diagram illustrating an exemplary method or system for working of ordinary classifier having single output unit which classifies given input in to single class from available multiple classes;
  • FIG. 3(B) is a diagram illustrating an exemplary method or system for mapping of single output value of ordinary classifier in to single class from available multiple classes; '
  • FIG. 4(A) is a flow diagram illustrating an exemplary method for working of multi class classifier with single output unit
  • FIG. 4(B) is a diagram illustrating an exemplary method or system for mapping of single output value of multi class classifier as member of one or multiple classes
  • FIG. 5 is a flow diagram illustrating an exemplary method or system for working of ordinary classifier with multiple output units which classifies the given input in to a particular class by selecting or activating corresponding output unit;
  • FIG. 6 is a flow diagram illustrating an exemplary method or system for working of multi class classifier with multiple output units which classifies the given input as member of one or multiple classes.
  • FIG. 7 is a flow diagram illustrating an exemplary method or system for working of ordinary classifier with multiple output units which classifies the given input in to a particular class by using binary pattern generated by output units.
  • FIG. 8 is a flow diagram illustrating an exemplary method or system for working of multi class classifier with multiple output units which classifies the given input as member of one or multiple classes by using binary pattern generated by output units.
  • FIG. 1 is a flow diagram showing the general flow of multi class classifier which has been trained by single class dataset (101).
  • This single class dataset (101) may be in any format which stores data or records or objects. Examples of such formats are text format, CSV format, database format, or folder format which contains objects and their class label.
  • this single class dataset (101) is used by multi class dataset generation (102) module to generate multi class dataset (103).
  • FIG. 2(A) Detailed process of generation of multi class dataset from single class dataset
  • FIG. 2(B) shows Sample copy of single class dataset (101) and generated multi class dataset (103) are shown in Table 1 and Table 4 respectively at letter part of this document.
  • multi class training dataset (104) and multi class testing dataset (105) which might be pre-proposed or directly used by multi class classifier (107) for respectively training and testing of behavior layer (106).
  • finalized behavior layer (106) is used by the multi class classifier (107) for the real-time or offline classification of input data (108) as member of one or multiple classes.
  • multi class classifier with three classes, Class A (109), Class B (110), and Class C (111) has been shown. Classifier shown in FIG.
  • Class 1 is able to classify the given single instance or data item or record or object in to the Class A (109) only; Class B (110) only; Class C (11 1) only; Class A and Class B, that is Class AB (112); Class A and Class C, that is Class AC (113); Class B and Class C, that is Class BC (1 14); Class A, Class B and Class C, that is Class ABC (1 15).
  • Class ⁇ Label 1 ⁇ ⁇ Label 2 ⁇ .... ⁇ Label N ⁇ .
  • Class AB means combination of Class A as well as Class B. '
  • multi class classifier (107) For classification of input into one or more classes, multi class classifier (107) is design and trained in such a way that it gives output or outputs which can be mapped to one or more classes. Detailed methods or systems for mapping of output or outputs as member of one or multiple classes are shown in FIG. 4, FIG. 6 and FIG. 8 and discussed in more details at letter part of this document. On the basis of the number of output units used, multi class classifier presented in FIG. 4, FIG. 6 and FIG. 8, are broadly categorized in classifier with single output unit and classifier with multiple output units. Multi class classifier shown in FIG. 4 uses single output unit whose output value is mapped to one or more classes, while classifier shown in FIG. 6 and FIG. 8 uses multiple output units.
  • FIG. 2(A) shows step by step process or method of generating multi class dataset (206) from the single class dataset (201).
  • Sample copy of single class dataset and multi class dataset are shown in Table 1 and Table 4 respectively.
  • normal values of Attribute -1, Attribute -2, Attribute-3 and Attribute-4 are taken as 0.1, 0.2, 0.3 and 0.4 respectively.
  • association technique (202) on single class dataset (201) and discovers feature to class association (203) and generates rules based on feature's value that decide the class labels (204). Any technique which generates such rules (204) by mapping the attribute and their values to a specific class can be used as association technique (202).
  • Sample copy of rules (204) generated by applying such association technique (202) on single class dataset available in Table 1 is shown in Table 2.
  • Table 2 Sample Copy of Rules Generated From Single Class Dataset of Table-1 by
  • the Rule 1 "If (Attribute- 1 ⁇ 0.3 to 0.8) then Class A" means that attribute 1 with value [0.3 to 0.8] leads to the classification of record in to the class A.
  • Rule -2 means, that attribute-2 with value [0.5] and attribute-3 with value [0.4 to 0.6] collectively classify the given record in to the class A.
  • attribute-3 with value [0.7 to 0.9] classify the given record in to class B.
  • other rules are generated. Number of the rules generated in the Table 2 and their constrains mainly depends upon: number of records fed in association technique; number of attributes contained by the dataset; and verities of the records available in the dataset.
  • System or method presented in FIG. 2(A) and (B) can also be used to generate additional multi class records from already generated multi class records by replacing single class dataset (201) with generated multi class dataset.
  • preexisting multi class dataset can also be used instead of single class dataset (201).
  • any combination of preexisting single class dataset, preexisting multi class dataset and generated multi class dataset can be used instead of single class dataset (201 ) in FIG. 2(A).
  • Self explanatory detailed process of generation of multi class record from preexisting multi class dataset or generated multi class dataset is shown in Table 5.
  • Table 6 shows self explanatory detailed process for generation of multi class dataset from single class dataset and multi class dataset which may be preexisting or generated.
  • single class dataset (201) can be replaced by: preexisting multi class dataset; or generated multi class dataset; or combination of preexisting single class and preexisting multi class dataset; or combination of preexisting single class and generated multi class dataset; or combination of preexisting multi class and generated multi class dataset; or combination of preexisting single class, preexisting multi class and generated multi class dataset.
  • classifier may use single output unit or multiple output units.
  • Classifier with single output unit is shown in FIG. 3(A) and FIG. 4(A) while classifier with multiple output units is shown in FIG. 5, FIG. 6, FIG. 7 and FIG. 8.
  • FIG. 3(A) shows basic working of the ordinary classifier having single output unit which classifies the given input data in to one of the available classes.
  • classifier (304) uses single class training data (301) and single class testing data (302) and generates behavior layer (305) which is used for classification of input data (303). Classifier (304) classifies the input data (303) in to any one of the available multiple classes. For example in the FIG.
  • classifier (304) classifies input in to any one of three classes namely Class A (306) or Class B (307) or Class C (308).
  • Method for mapping of output value of classifier with single output unit (304) in to one of the available multiple classes is presented in FIG. 3(B).
  • output range is divided in to sub range as per the available classes. For example, if number of the classes are three then sub ranges could be 0.0 (309) to 0.30 (311), 0.35 (312) to 0.65(314), 0.7(315) to 1.0(317).
  • Gap between two classes for example, gap between Class A (310) and Class B (313) which is 0.31 to 0.34, is used for overcoming the class overlapping.
  • FIG. 3(A) and FIG. 3(B) classify the given input in to one of the available classes. For example, as per FIG. 3(A), for the given input data (303), if classifier (304) gives output as 0.2 then as per FIG. 3(B), 0.2 output value is mapped to Class A and hence, given input is classified as member of Class A.
  • FIG. 4(A) shows architecture of multi class classifier having single output unit which classifies the give record as member of one or multiple classes.
  • the classifier uses the multi class training dataset (401) and multi class testing dataset (402) which can be generated as per FIG. 2(A) and FIG. 2(B).
  • the classifier classifies the input data (403) as member of one or multiple classes.
  • classifier classifies input data either in to the Class A(406) or Class B(407) or Class C(408) or Class AB(409) or Class AC(410) or Class BC(411) or Class ABC(412).
  • FIG. 4(B) is used which is very similar to approach shown in FIG. 3(B). As per FIG. 4(B), output range is divided in to the 7 sub ranges having gap between each range. This ranges and gaps are only for the illustration purpose and hence can be varied as per the requirement.
  • FIG. 4(A) and FIG. 4(B) classify the given input as member of one or multiple classes. For example, as per FIG. 4(A), for the given input data (403), if classifier (404) gives output as 0.89 then as per FIG. 4(B), 0.89 output value is mapped to Class ABC (432) and hence, given input is classified as member of Class ABC.
  • FIG. 5 shows working of the single class classifier with multiple output units.
  • the classifier (504) uses training data (501) and testing data (502) and generates behavior layer (505). Using this behavior layer (505), the classifier (504) classifies the given input data (503) in to the one of the available multiple classes. For example in FIG. 5, classifier classifies the input in to either Class A (506) or Class B (507) or Class C (508). As per the example presented in the FIG. 5, classifier uses three output units for the classification. Particular class is selected if respective output unit is activated. Output value 100 indicates Class A (506), 010 indicates Class B (507) and 001 indicates Class C (508).
  • Architecture presented in FIG. 5, can be extended in to multi class classifier by providing multi class training and testing data and by training the classifier by allowing to activate one or more output units. Such architecture is shown in FIG. 6.
  • FIG. 6 shows working of multi class classifier having multiple output units.
  • classifier (604) uses multi class training data (601) and multi class testing data (602) which can be generated as per FIG. 2(A) and FIG. 2(B).
  • Such multi class training data (601) and multi class testing data (602) is used to generate behavior layer (605).
  • Such trained classifier (604) classifies the given input data (603) as member of one or multiple classes.
  • Particular class or classes are selected as per the activation of respective output units. For example, in the FIG. 6, for the case of three classes, three output units are used for multi class classification.
  • FIG. 7 is an ordinary classifier which classifies the input data (703) in to one of the available multiple classes by activation of output unit in binary pattern.
  • two output units are used for classification of input data (703) in to any one of three classes, i.e. Class A (706), Class B (707) and Class C (708).
  • Binary pattern generated by the output units helps to classify the input data (703) in to the three classes mentioned above. For example, as per FIG. 7, if output pattern generated is 01 then Class B (707) is selected.
  • Architecture presented in FIG. 7, can also be extended in to multi class classifier which is shown in FIG. 8.
  • FIG. 8 shows working of multi class classifier that classifies given input as member of one or multiple classes using binary output pattern.
  • classifier (804) uses multi class training dataset (801) and multi class testing dataset (802). These multi class training and testing dataset can be generated as per FIG. 2(A) and FIG. 2(B). Such multi class training data (801) and multi class testing data (802) are used to generate behavior layer (805).
  • Such trained classifier (804) classifies the given input data (803) in to the multiple classes. Particular class or classes are selected as per the binary pattern generated by the output units. For example, in the FIG. 8, three output units are used for multi class classification.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present invention addresses the methods and systems for multi class classifier which is capable to classify the given data item as member of one or multiple classes at same point of time. Invention also addresses various multi class classification methods and systems which include classification using single output unit and classification using multiple output units. To train and test such multi class classifiers, multi class dataset is required. Under the un-availability of such multi class dataset, the present invention is capable to generate multi class data item from single class data items. Further, present invention is also able to generate multi class data item from preexisting multi class data items, generated multi class data items or any combination of preexisting single class, preexisting multi class and generated multi class data items.

Description

Title of Invention
Multi Class Classifier From Single Class Dataset
Description
Definition of Terms Used
In this invention, the terms instance, record, object, data, data item, and input are intended to be synonymous.
In this invention, single class instance or data item or record or input or any object means that the instance or data item or record or input or any object is a member of any one of the multiple classes. Collection of such single class instance or data item or record or input or any. object is referred as single class dataset. Further, in this invention, multi class instance or data item or record or input or any object means that the instance or data item or record or input or any object is a member of multiple classes. Collection of such multi class instance or data item or record or input or any object is referred as multi class dataset in this invention.
In this invention, single class classifier is defined as the classifier which classifies given input as a member of one the multiple classes. Similarly, multi class classifier is a classifier which classifies the given input as a member of one or multiple classes.
Field of the Invention
This invention relates to methods and systems of multi class classifier where the multi class classifier is trained and tested by generating multi class dataset from single class dataset and such trained multi class classifier automatically classifies instance or data item or record or any object as member of one or multiple classes. In addition to the multi class classification, this invention can also be applied for multi class categorization of given instance or data item or record or any object.
Multi class classifier of this invention can be used for various purpose like : classification of network traffic as member of normal or one or multiple attack classes, for example classification of single instance of network traffic in to DOS and PROB attack classes; classification of a webpage as member of one or multiple classes, for example classification of a webpage as member of Education and Game classes; classification of a user or human or animal as member of one or multiple classes, for example classification of a user as member of Romantic and Emotional classes; classification of text as member of one or multiple classes, for example classification of a word as member of Indian Word and German Word classes; classification of a health record as member of one or multiple classes, for example classification of a health record as member of Diabetic and Asthma classes; classification of a email as member of one or multiple classes, for example classification of a email as member of Office and Education classes or in any method or system which requires the classification of a input as member of one or multiple classes where the multi class dataset for the training and testing purpose is not available.
Background of the Invention and Prior Art
Various single class classifiers had been already invented to classify the given object as member of single class from available multiple classes. Such classifiers are not capable to classify the given object as member of multiple classes. There are various applications where multi class classifier which classifies given object as member of multiple classes is required. To train and test such multi class classifier, collection of multi class record where each record is' a member of multiple classes is required. If such collection of multi class record is not available then multi class classifier cannot be constructed. This invention addresses the generation of multi class record from preexisting single class records. Presented invention also addresses the multi class classifier which classifies the input that may be in any form like instance or data item or record or object as member of one or multiple classes.
There are various inventions which claim multi class classification. For example, pattern No: CN 102722726 B with title multi-class support vector machine classification method based on dynamic binary tree and Paten No: US6816456 with title methods and apparatus for network use optimization which claims classification of the given input pattern as member of one of the available multiple classes. Above inventions are not be able to classify the given object as member of multiple classes and therefore they are not addressing multi class classifier in true sense.
For classification of given webpage content as member of multiple classes, patent No. US 7974994 B2 with title sensitive webpage content detection disclosed a multi-class classifier. In the above invention, contents of web page are analyzed with the multi- class classifier and webpage is categorized as member of one or multiple sensitivity categories. In the above invention, for the given web page, words or phrases are extracted which are fed in to the classifier. As a result of such classification, for a given word, single class having highest probability value is selected. At outer level, webpage, having more than one word is classified in to multiple classes due to classification of each individual word in to one class. Hence, the classifier of above invention is not multi class classifier in true sense which is the major limitation of above invention.
Patent No: US20100014762 with title categorizer with user-controllable calibration classifies the given object as member of two or more classes. However, above invention uses user calibration for such classification. Classifying large amount of data like network traffic using user calibration is not sensible.
Patent NO: US006823323B2 with title Automatic classification method and apparatus which is very close to this invention classifies the record as member of one or multiple classes using ballpark classier. To train the classifier, above invention uses records which itself belongs to multiple classes. If such multi class records are not available then above invention is of no use. Further, to properly train such classifier, sufficient number of records must be fed. Sufficient number of records depends upon the classifier, number of inputs and their type. Having sufficient multi class records or even having multi class record is very difficult.
As mention above, there aren't any inventions which use single class records and generate multi class records which in turn used to train and test the multi class classifier which classifies the given record as member of multiple classes. In this invention, methods and systems are discussed which unravel the above lacuna.
Brief Description of the Drawings
This invention is illustrated in the accompanying drawings, throughout which reference letters indicate corresponding parts in the respective figure.
FIG. 1 is a flow diagram showing the general flow of multi class classifier that uses single class dataset and generates multi class dataset which is used to train and test the multi class classifier;
FIG. 2(A) is a flow diagram illustrating an exemplary method or system for multi class dataset generation through single class dataset;
FIG. 2(B) is a flow diagram illustrating generation of multi class training and testing dataset from multi class dataset.
FIG. 3(A) is a flow diagram illustrating an exemplary method or system for working of ordinary classifier having single output unit which classifies given input in to single class from available multiple classes;
FIG. 3(B) is a diagram illustrating an exemplary method or system for mapping of single output value of ordinary classifier in to single class from available multiple classes; '
FIG. 4(A) is a flow diagram illustrating an exemplary method for working of multi class classifier with single output unit; FIG. 4(B) is a diagram illustrating an exemplary method or system for mapping of single output value of multi class classifier as member of one or multiple classes;
FIG. 5 is a flow diagram illustrating an exemplary method or system for working of ordinary classifier with multiple output units which classifies the given input in to a particular class by selecting or activating corresponding output unit;
FIG. 6 is a flow diagram illustrating an exemplary method or system for working of multi class classifier with multiple output units which classifies the given input as member of one or multiple classes.
FIG. 7 is a flow diagram illustrating an exemplary method or system for working of ordinary classifier with multiple output units which classifies the given input in to a particular class by using binary pattern generated by output units.
FIG. 8 is a flow diagram illustrating an exemplary method or system for working of multi class classifier with multiple output units which classifies the given input as member of one or multiple classes by using binary pattern generated by output units.
Description of Preferred Embodiments
In the following description, reference numbers are used to illustrate major features of exemplary embodiments in a diagrammatic manner. The drawings are not drawn to scale. Moreover, drawings are not intended to depict every feature of actual embodiments nor relative dimensions of the depicted elements.
Referring to the drawings, FIG. 1 is a flow diagram showing the general flow of multi class classifier which has been trained by single class dataset (101). This single class dataset (101) may be in any format which stores data or records or objects. Examples of such formats are text format, CSV format, database format, or folder format which contains objects and their class label. After preprocessing or directly, this single class dataset (101) is used by multi class dataset generation (102) module to generate multi class dataset (103). Detailed process of generation of multi class dataset from single class dataset is shown in FIG. 2(A), while generation of training dataset (104) and testing dataset (105) from generated multi class dataset (103) is shown in FIG. 2(B). Sample copy of single class dataset (101) and generated multi class dataset (103) are shown in Table 1 and Table 4 respectively at letter part of this document.
These multi class training dataset (104) and multi class testing dataset (105) which might be pre-proposed or directly used by multi class classifier (107) for respectively training and testing of behavior layer (106). After testing complete, finalized behavior layer (106) is used by the multi class classifier (107) for the real-time or offline classification of input data (108) as member of one or multiple classes. For the simplicity, in FIG. 1, multi class classifier with three classes, Class A (109), Class B (110), and Class C (111) has been shown. Classifier shown in FIG. 1 is able to classify the given single instance or data item or record or object in to the Class A (109) only; Class B (110) only; Class C (11 1) only; Class A and Class B, that is Class AB (112); Class A and Class C, that is Class AC (113); Class B and Class C, that is Class BC (1 14); Class A, Class B and Class C, that is Class ABC (1 15). Here combination of two or more class is shown as Class {Label 1 } {Label 2} .... {Label N}. For example Class AB means combination of Class A as well as Class B. '
For classification of input into one or more classes, multi class classifier (107) is design and trained in such a way that it gives output or outputs which can be mapped to one or more classes. Detailed methods or systems for mapping of output or outputs as member of one or multiple classes are shown in FIG. 4, FIG. 6 and FIG. 8 and discussed in more details at letter part of this document. On the basis of the number of output units used, multi class classifier presented in FIG. 4, FIG. 6 and FIG. 8, are broadly categorized in classifier with single output unit and classifier with multiple output units. Multi class classifier shown in FIG. 4 uses single output unit whose output value is mapped to one or more classes, while classifier shown in FIG. 6 and FIG. 8 uses multiple output units.
FIG. 2(A) shows step by step process or method of generating multi class dataset (206) from the single class dataset (201). Sample copy of single class dataset and multi class dataset are shown in Table 1 and Table 4 respectively. For simplicity, in all the tables used in this document, normal values of Attribute -1, Attribute -2, Attribute-3 and Attribute-4 are taken as 0.1, 0.2, 0.3 and 0.4 respectively. As per the FIG. 2(A), invention applies association technique (202) on single class dataset (201) and discovers feature to class association (203) and generates rules based on feature's value that decide the class labels (204). Any technique which generates such rules (204) by mapping the attribute and their values to a specific class can be used as association technique (202). Sample copy of rules (204) generated by applying such association technique (202) on single class dataset available in Table 1 is shown in Table 2.
Table 1: Sample Copy of Single Class Dataset
row belongs to single class from multiple classes available)
Figure imgf000008_0001
Table 2 : Sample Copy of Rules Generated From Single Class Dataset of Table-1 by
Applying Association Technique
Figure imgf000008_0002
Attribute-3→0.2 If ((Attribute- 1→0.2)
AND
(Attribute-3→0.2))
then Class B
c 4 Attribute-4→0.9 Rule 5:
If (Attribute-4→0.9)
then Class C
D 3,4 . Attribute-3→0. Rule 6:
Attribute-4→0.1 If ((Attribute-3 ,→0.1)
AND
(Attribute-4→0.1))
then Class D
As per the Table 2, the Rule 1 : "If (Attribute- 1→0.3 to 0.8) then Class A" means that attribute 1 with value [0.3 to 0.8] leads to the classification of record in to the class A. Similarly, Rule -2 means, that attribute-2 with value [0.5] and attribute-3 with value [0.4 to 0.6] collectively classify the given record in to the class A. Likewise, as per Rule 3, attribute-3 with value [0.7 to 0.9] classify the given record in to class B. Similarly, other rules are generated. Number of the rules generated in the Table 2 and their constrains mainly depends upon: number of records fed in association technique; number of attributes contained by the dataset; and verities of the records available in the dataset.
Figure imgf000009_0001
Figure imgf000010_0001
For generation of multi class record, rules of Table 2, which clearly classify the given record in to a specific category, are selected. By applying one or more rules on the preexisting single class record, multi class record is generated. Self explanatory detailed process of generation of multi class record is shown in Table 3, while sample copy of multi class dataset generated from the Table 1 's dataset is shown in Table 4.
Table 4 : Multi Class Dataset which is Generated from Single Class Dataset of Table
1 by Applying Rules of Table 2
(Data row belongs to two or more classes)
Figure imgf000010_0002
If separate single class training and single class testing dataset. are available, in that case in FIG. 2(A), instead of single class dataset(201), such separate training or testing dataset can be taken as input to the association technique(202) and instead of multi class dataset (206), multi class training or multi class testing dataset is generated respectively. Instead of such separate training and testing dataset if single class dataset is used in FIG. 2(A) as input to the association technique (202) then multi class dataset (206) is generated. This multi class dataset(206) must be divided in to multi class training dataset (207) and multi class testing ,dataset(208.) which is shown in FIG. 2(B). In FIG. 2(B), from the given dataset, for generation of separate training and testing dataset, any existing techniques which generates such training and testing dataset from the single class dataset can be used.
System or method presented in FIG. 2(A) and (B) can also be used to generate additional multi class records from already generated multi class records by replacing single class dataset (201) with generated multi class dataset. Further, for generation of additional multi class records, preexisting multi class dataset can also be used instead of single class dataset (201). In general, any combination of preexisting single class dataset, preexisting multi class dataset and generated multi class dataset can be used instead of single class dataset (201 ) in FIG. 2(A). Self explanatory detailed process of generation of multi class record from preexisting multi class dataset or generated multi class dataset is shown in Table 5. Similarly, Table 6 shows self explanatory detailed process for generation of multi class dataset from single class dataset and multi class dataset which may be preexisting or generated.
Table 5: Multi Class Dataset Generation from
Preexisting Multi Class or Generated Multi Class Dataset
Figure imgf000011_0001
Hence, in FIG. 2(A), to generate additional multi class dataset, single class dataset (201) can be replaced by: preexisting multi class dataset; or generated multi class dataset; or combination of preexisting single class and preexisting multi class dataset; or combination of preexisting single class and generated multi class dataset; or combination of preexisting multi class and generated multi class dataset; or combination of preexisting single class, preexisting multi class and generated multi class dataset.
"able 6: Multi Class Dataset Generation from Preexisting Single Class Dataset and from
Preexisting or Generated Multi Class Dataset
Figure imgf000012_0001
In general, classifier may use single output unit or multiple output units. Classifier with single output unit is shown in FIG. 3(A) and FIG. 4(A) while classifier with multiple output units is shown in FIG. 5, FIG. 6, FIG. 7 and FIG. 8. FIG. 3(A) shows basic working of the ordinary classifier having single output unit which classifies the given input data in to one of the available classes. As per the FIG. 3(A), classifier (304) uses single class training data (301) and single class testing data (302) and generates behavior layer (305) which is used for classification of input data (303). Classifier (304) classifies the input data (303) in to any one of the available multiple classes. For example in the FIG. 3(A), classifier (304) classifies input in to any one of three classes namely Class A (306) or Class B (307) or Class C (308). Method for mapping of output value of classifier with single output unit (304) in to one of the available multiple classes is presented in FIG. 3(B). As per the FIG. 3(B), output range is divided in to sub range as per the available classes. For example, if number of the classes are three then sub ranges could be 0.0 (309) to 0.30 (311), 0.35 (312) to 0.65(314), 0.7(315) to 1.0(317). Gap between two classes, for example, gap between Class A (310) and Class B (313) which is 0.31 to 0.34, is used for overcoming the class overlapping. These ranges and gap are only for the illustration purpose and hence it can be varied as per the requirement. Collectively FIG. 3(A) and FIG. 3(B) classify the given input in to one of the available classes. For example, as per FIG. 3(A), for the given input data (303), if classifier (304) gives output as 0.2 then as per FIG. 3(B), 0.2 output value is mapped to Class A and hence, given input is classified as member of Class A.
FIG. 4(A) shows architecture of multi class classifier having single output unit which classifies the give record as member of one or multiple classes. As per the FIG. 4(A), the classifier uses the multi class training dataset (401) and multi class testing dataset (402) which can be generated as per FIG. 2(A) and FIG. 2(B). The classifier classifies the input data (403) as member of one or multiple classes. For example as shown in FIG. 4(A), classifier classifies input data either in to the Class A(406) or Class B(407) or Class C(408) or Class AB(409) or Class AC(410) or Class BC(411) or Class ABC(412). For mapping of the singe output value to the various classes, approach presented in FIG. 4(B) is used which is very similar to approach shown in FIG. 3(B). As per FIG. 4(B), output range is divided in to the 7 sub ranges having gap between each range. This ranges and gaps are only for the illustration purpose and hence can be varied as per the requirement. Collectively FIG. 4(A) and FIG. 4(B) classify the given input as member of one or multiple classes. For example, as per FIG. 4(A), for the given input data (403), if classifier (404) gives output as 0.89 then as per FIG. 4(B), 0.89 output value is mapped to Class ABC (432) and hence, given input is classified as member of Class ABC. FIG. 5 shows working of the single class classifier with multiple output units. The classifier (504) uses training data (501) and testing data (502) and generates behavior layer (505). Using this behavior layer (505), the classifier (504) classifies the given input data (503) in to the one of the available multiple classes. For example in FIG. 5, classifier classifies the input in to either Class A (506) or Class B (507) or Class C (508). As per the example presented in the FIG. 5, classifier uses three output units for the classification. Particular class is selected if respective output unit is activated. Output value 100 indicates Class A (506), 010 indicates Class B (507) and 001 indicates Class C (508). Architecture presented in FIG. 5, can be extended in to multi class classifier by providing multi class training and testing data and by training the classifier by allowing to activate one or more output units. Such architecture is shown in FIG. 6.
FIG. 6 shows working of multi class classifier having multiple output units. As per the FIG. 6, classifier (604) uses multi class training data (601) and multi class testing data (602) which can be generated as per FIG. 2(A) and FIG. 2(B). Such multi class training data (601) and multi class testing data (602) is used to generate behavior layer (605). Such trained classifier (604) classifies the given input data (603) as member of one or multiple classes. Particular class or classes are selected as per the activation of respective output units. For example, in the FIG. 6, for the case of three classes, three output units are used for multi class classification. If output of the classifier (604) is 100 then Class A (606), if output is 010 then Class B (607), if output is 001 then Class C (608), if output is 110 then Class AB (609), if output is 101 then Class AC (610), if output is 011 then Class BC (611) and if output is 111 then Class ABC (612) is selected.
One more approach for classification of given input as member of multiple classes is shown in FIG. 7. FIG. 7 is an ordinary classifier which classifies the input data (703) in to one of the available multiple classes by activation of output unit in binary pattern. For example in FIG. 7, two output units are used for classification of input data (703) in to any one of three classes, i.e. Class A (706), Class B (707) and Class C (708). Binary pattern generated by the output units helps to classify the input data (703) in to the three classes mentioned above. For example, as per FIG. 7, if output pattern generated is 01 then Class B (707) is selected. Architecture presented in FIG. 7, can also be extended in to multi class classifier which is shown in FIG. 8.
FIG. 8 shows working of multi class classifier that classifies given input as member of one or multiple classes using binary output pattern. As per the FIG. 8, classifier (804) uses multi class training dataset (801) and multi class testing dataset (802). These multi class training and testing dataset can be generated as per FIG. 2(A) and FIG. 2(B). Such multi class training data (801) and multi class testing data (802) are used to generate behavior layer (805). Such trained classifier (804) classifies the given input data (803) in to the multiple classes. Particular class or classes are selected as per the binary pattern generated by the output units. For example, in the FIG. 8, three output units are used for multi class classification. If output of the classifier (804) is 000 then Class A (806), if output is 001 then Class B (807), if output is 010 then Class C (808), if output is 011 then Class AB (809), if output is 100 then Class AC (810), if output is 101 then Class BC (811) and if output is 110 then Class ABC (812) is' selected.

Claims

Claims We Claim:
1. The multi class classification method or system for classification of instance or record or data item or object or any input as member of one or multiple classes.
2. From the preexisting single class instance or records or objects or data items, a method or system for generation of multi class instance or record or data item or object which belongs to two or more classes or multiple classes.
3. From the preexisting multi class instance or records or objects or data items, method and system for generation of multi class instance or record or data item or object which belongs to combination of two or more classes or multiple classes.
4. From the preexisting multi class and preexisting single class instances or records or objects or data items, method and system for generation of multi class instance or record or data item or object which belongs to combination of two or more classes or multiple classes.
5. From the generated multi class records, method or system for generation of additional multi class records.
6. From the combination of generated multi class records and preexisting single class records, method or system for generation of multi class records.
7. Form the combination of generated multi class records and preexisting multi class records, method or system for generation of multi class records. ,
8. From the combination of preexisting records which contains single class and multi class records; and generated multi class records, method or system for generation of multi class records.
9. Method or system for multi class classifier having single output unit which classifies instance or record or data item or object as member of one or multiple classes.
10. Multi class classifier having multiple output units which classifies instance or record or data item or object as member of one or multiple classes.
11. Method or system for multi class classifier having single output unit comprising: Selection of any one of the following single class or multi class records as input to the classifier:
pre-existing single class records or
pre-existing multi class record or
generated multi class record or
any combination of pre-existing single class record, pre-existing multi class and generated multi class record; and
generates multi class records or dataset; and
'such generated multi class records or dataset is used to train the classifier having single output unit which classifies the given instance or record or data item or object as member of one or multiple classes.
12. Method or system for multi class classifier having multiple output units comprising: Selection of any one of the following single class or multi class records as input to the classifier:
pre-existing single class records or
pre-existing multi class record or
generated multi class record or
any combination of pre-existing single class record, pre-existing multi class and generated multi class record; and
generates multi class records or dataset; and
such generated multi class records or dataset is used to train the classifier having multiple output units which classifies the given instance or record or data item or object as member of one or multiple classes.
PCT/IN2016/000116 2015-05-05 2016-05-04 Multi class classifier from single class dataset WO2016178243A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN1794/MUM/2015 2015-05-05
IN1794MU2015 IN2015MU01794A (en) 2015-05-05 2016-05-04

Publications (1)

Publication Number Publication Date
WO2016178243A1 true WO2016178243A1 (en) 2016-11-10

Family

ID=54394887

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IN2016/000116 WO2016178243A1 (en) 2015-05-05 2016-05-04 Multi class classifier from single class dataset

Country Status (2)

Country Link
IN (1) IN2015MU01794A (en)
WO (1) WO2016178243A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160335521A1 (en) * 2015-05-14 2016-11-17 Canon Kabushiki Kaisha Method and apparatus for generating, updating classifier, detecting objects and image processing device
WO2020243333A1 (en) * 2019-05-30 2020-12-03 The Research Foundation For The State University Of New York System, method, and computer-accessible medium for generating multi-class models from single-class datasets

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7769228B2 (en) * 2004-05-10 2010-08-03 Siemens Corporation Method for combining boosted classifiers for efficient multi-class object detection
US20110302111A1 (en) * 2010-06-03 2011-12-08 Xerox Corporation Multi-label classification using a learned combination of base classifiers

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7769228B2 (en) * 2004-05-10 2010-08-03 Siemens Corporation Method for combining boosted classifiers for efficient multi-class object detection
US20110302111A1 (en) * 2010-06-03 2011-12-08 Xerox Corporation Multi-label classification using a learned combination of base classifiers

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160335521A1 (en) * 2015-05-14 2016-11-17 Canon Kabushiki Kaisha Method and apparatus for generating, updating classifier, detecting objects and image processing device
US10242295B2 (en) * 2015-05-14 2019-03-26 Canon Kabushiki Kaisha Method and apparatus for generating, updating classifier, detecting objects and image processing device
WO2020243333A1 (en) * 2019-05-30 2020-12-03 The Research Foundation For The State University Of New York System, method, and computer-accessible medium for generating multi-class models from single-class datasets

Also Published As

Publication number Publication date
IN2015MU01794A (en) 2015-06-19

Similar Documents

Publication Publication Date Title
US7788086B2 (en) Method and apparatus for processing sentiment-bearing text
CA2773219C (en) Displaying relationships between electronically stored information to provide classification suggestions via nearest neighbor
Vosoughi et al. Enhanced twitter sentiment classification using contextual information
CN107301171A (en) A kind of text emotion analysis method and system learnt based on sentiment dictionary
CN106445919A (en) Sentiment classifying method and device
US10216838B1 (en) Generating and applying data extraction templates
CN105447505B (en) A kind of multi-level important email detection method
CN106471490A (en) Trunking communication based on classification
Kucher et al. Text visualization browser: A visual survey of text visualization techniques
US9785705B1 (en) Generating and applying data extraction templates
Lou et al. Multilabel subject-based classification of poetry
Zhang et al. An ensemble method for unbalanced sentiment classification
Hecking et al. Can topic models be used in research evaluations? Reproducibility, validity, and reliability when compared with semantic maps
US20160085848A1 (en) Content classification
Xia et al. Improving patient opinion mining through multi-step classification
WO2016178243A1 (en) Multi class classifier from single class dataset
CN107330076A (en) A kind of network public sentiment information display systems and method
Manzoor et al. Social mining for sustainable cities: thematic study of gender-based violence coverage in news articles and domestic violence in relation to COVID-19
Coban IRText: An item response theory-based approach for text categorization
Ben Ismail et al. Insult detection in social network comments using possibilistic based fusion approach
US8645290B2 (en) Apparatus and method for improved classifier training
Purchase et al. A classification of infographics
CN112084376A (en) Map knowledge based recommendation method and system and electronic device
Nagavelli et al. Amazon Reviews Sentiment Analysis, Segmentation, Classification and Prediction leveraging Multi-Class Multi-Output Classification
Reddy et al. A new approach for authorship attribution

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16789429

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16789429

Country of ref document: EP

Kind code of ref document: A1