CN112560920A - Machine learning classification method based on self-adaptive error correction output coding - Google Patents

Machine learning classification method based on self-adaptive error correction output coding Download PDF

Info

Publication number
CN112560920A
CN112560920A CN202011433367.4A CN202011433367A CN112560920A CN 112560920 A CN112560920 A CN 112560920A CN 202011433367 A CN202011433367 A CN 202011433367A CN 112560920 A CN112560920 A CN 112560920A
Authority
CN
China
Prior art keywords
data set
data
value
class
gradient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011433367.4A
Other languages
Chinese (zh)
Other versions
CN112560920B (en
Inventor
肖子扬
江均贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202011433367.4A priority Critical patent/CN112560920B/en
Publication of CN112560920A publication Critical patent/CN112560920A/en
Application granted granted Critical
Publication of CN112560920B publication Critical patent/CN112560920B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The invention relates to a machine learning classification method based on self-adaptive error correction output coding, which comprises the following steps: s1, loading a data set required by a machine, creating a corresponding test set, S2, analyzing the data set, dividing the data set after analyzing and using a classification function, S3, extracting the divided data set, defining a data set interval to be 0-K, setting the required row number and column number to form a coding matrix, generating and obtaining a prior information set, S4, inputting a numerical value within K into the machine, forming corresponding parameters by the data, comparing the parameters with a predicted value in the prior information set to obtain an error value, S5, determining an adjusted gradient by the positive and negative values of the obtained error value, said gradient being fed back to the parameter, adjusting the input values, S6, classifying the results obtained after learning the different data sets, S7, deleting the values with error value not 0, and deleting the parameters and the classifier in the corresponding data set, and keeping the data set with the predicted value and the parameter value corresponding to each other.

Description

Machine learning classification method based on self-adaptive error correction output coding
Technical Field
The invention relates to a machine learning classification method, in particular to a machine learning classification method based on self-adaptive error correction output coding.
Background
The machine learning algorithm under the big data environment can ignore the importance degree of the learning result according to a certain performance standard. The division strategy is implemented in a distributed and parallel computing mode, so that interference caused by noise data and redundancy can be avoided, storage consumption is reduced, and the operation efficiency of a learning algorithm is improved.
The existing error correction output coding algorithm generates a coding matrix immediately for classification, the data interval is large, more irrelevant and fussy data exist in the data matrix formed after coding, a large amount of time is wasted to verify some data with extremely low probability when a machine learns, and a database generated after classification is too thick and complicated and increases the workload.
Disclosure of Invention
The invention provides a machine learning classification method based on self-adaptive error correction output coding, which can effectively solve the problems.
The invention is realized by the following steps:
a machine learning classification method based on adaptive error correction output coding comprises the following steps:
s1, loading the data set to be learned by the machine, creating the corresponding test set,
s2, analyzing the data set, dividing the data set into semantic class and panoramic class after analyzing and using classification function,
s3, extracting the segmented data set, defining the data set interval as 0, K, setting the needed row number and column number to form a coding matrix, generating to obtain a prior information set,
s4, inputting the value within K into the machine, forming corresponding parameters by the data, comparing with the predicted value in the prior information set to obtain an error value,
s5, determining the adjusted gradient through the positive and negative values of the obtained error value, feeding the gradient back to the parameter, adjusting the input value,
s6, constructing An M-class classifier, training N M classifiers, defining data in a data set as A1, A2 and A3 … … An, distinguishing the A1 from the A2 and A3 … … An by the first classifier, distinguishing the A2 from the A1 and the A3 … … An by the second classifier, classifying results obtained after different data sets are learned,
and S7, deleting the numerical value with the error value not being 0, deleting the parameters and the classifier in the corresponding data set, and keeping the data set with the predicted value and the corresponding parameter value.
As a further improvement, when the error value is a positive number, the gradient decreases, and when the error value is a negative number, the gradient increases.
As a further improvement, in step S2, the semantic class and the panorama class correspond to specifically:
semantic classes are data in the dataset labeled with its category,
the panorama class is to distinguish instance based on the semantic class.
As a further improvement, the values in the data set specifically include:
a first channel, to which the value belongs,
a second channel, the value belonging to the few instances in the data, expressing a number of instances inside,
the third channel, usually represented by 0 or 1, 0 represents single-bit data, and 1 represents multi-bit 1 data.
As a further refinement, the classification function is a train _ test _ split function.
As a further improvement, each time a new data set is imported in step S1, a new test set is created.
The invention has the beneficial effects that:
the invention divides and extracts information from a data set to form prior information, can generate a coding matrix adaptive to the data set according to the data of the prior information and the data with higher probability, forms a self-adaptive error correction output coding algorithm through the self-adaptive coding matrix, and classifies machine learning through the coding algorithm.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a schematic flow diagram of the process.
FIG. 2 is a schematic flow chart of machine learning from a priori information set.
Detailed Description
The embodiments of the present invention are within the scope of the present invention. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
In the description of the present invention, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating the purposes, technical solutions and advantages of the embodiments of the present invention, which will be described in detail and fully with reference to the accompanying drawings in the embodiments of the present invention. All other implementations that can be derived by one of ordinary skill in the art based on the embodiments of the present invention show or imply relative importance or implicitly indicate the number of technical features indicated, without inventive step. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
Referring to fig. 1-2, a machine learning classification method based on adaptive error correction output coding,
comprises the following steps:
s1, loading a data set to be learned by the machine, creating a corresponding test set, eliminating the data of the test set before analyzing the data without introducing the rule of the test data, thereby influencing the selection of the algorithm, ensuring that the result of the test by adopting the test set is objective and credible without the problem of data perspective deviation,
s2, analyzing the data set by Numpy, Pandas and Matplotlib, dividing the data set into semantic class and panoramic class after analyzing and using classification function,
s3, extracting the segmented data set, defining the data set interval as 0, K, setting the needed row number and column number in the range of K, forming a coding matrix, generating and obtaining a prior information set, constructing the prior information set to greatly improve the learning and classification of useless information and improve the learning efficiency,
s4, inputting the value within K into the machine, forming corresponding parameters by the data, comparing with the predicted value in the prior information set to obtain an error value,
s5, determining the adjusted gradient through the positive and negative values of the obtained error value, feeding the gradient back to the parameter, adjusting the input value,
s6, constructing An M-class classifier, training N M classifiers, defining data in a data set as A1, A2 and A3 … … An, distinguishing the A1 from the A2 and A3 … … An by the first classifier, distinguishing the A2 from the A1 and the A3 … … An by the second classifier, classifying results obtained after different data sets are learned,
and S7, deleting the numerical value with the error value not being 0, deleting the parameters and the classifier in the corresponding data set, and keeping the data set with the predicted value and the corresponding parameter value.
Further, when the error value is a positive number, the gradient is decreased, and when the error value is a negative number, the gradient is increased, so that the learned value can be adaptively adjusted and output until the machine can learn correct data and reject inappropriate data.
Further, in step S2, the semantic class and the panorama class correspond to specifically:
semantic classes are data in the dataset labeled with its category,
the panorama class is to distinguish instance based on the semantic class.
Further, the content of the numerical values in the data set specifically includes:
a first channel, to which the value belongs,
a second channel, the value belonging to the few instances in the data, expressing a number of instances inside,
the third channel, usually represented by 0 or 1, 0 represents single-bit data, and 1 represents multi-bit 1 data.
Further, the classification function is a train _ test _ split function, a random _ state parameter can achieve the function of setting a random generator seed, the seed can be transferred to a plurality of data sets with the same row number, and the data sets can be divided on the same index.
Further, each time a new data set is imported in step S1, a new test set needs to be created, otherwise each run will result in a different test set.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. A machine learning classification method based on adaptive error correction output coding is characterized in that,
comprises the following steps:
s1, loading the data set to be learned by the machine, creating the corresponding test set,
s2, analyzing the data set, dividing the data set after analyzing the data set by using a classification function,
is divided into semantic class and panoramic class,
s3, extracting the segmented data set, defining the data set interval as (0, K), setting the required row number and column number to form a coding matrix, generating to obtain a prior information set,
s4, inputting the value within K into the machine, forming corresponding parameters by the data, comparing with the predicted value in the prior information set to obtain an error value,
s5, determining the adjusted gradient through the positive and negative values of the obtained error value, feeding the gradient back to the parameter, adjusting the input value,
s6, constructing An M-class classifier, training N M classifiers, defining data in a data set as A1, A2 and A3 … … An, distinguishing the A1 from the A2 and A3 … … An by the first classifier, distinguishing the A2 from the A1 and the A3 … … An by the second classifier, classifying results obtained after different data sets are learned,
and S7, deleting the numerical value with the error value not being 0, deleting the parameters and the classifier in the corresponding data set, and keeping the data set with the predicted value and the corresponding parameter value.
2. The method of claim 1, wherein the gradient decreases when the error value is positive, and the gradient increases when the error value is negative.
3. The method of claim 1, wherein in step S2, the semantic class and the panorama class are specifically:
semantic classes are data in the dataset labeled with its category,
the panorama class is to distinguish instance based on the semantic class.
4. The method according to claim 1, wherein the values in the data set comprise contents specifically as follows:
a first channel, to which the value belongs,
a second channel, the value belonging to the few instances in the data, expressing a number of instances inside,
the third channel, usually represented by 0 or 1, 0 represents single-bit data, and 1 represents multi-bit 1 data.
5. The adaptive error correction output coding-based machine learning classification method according to claim 1, wherein the classification function is a train test split function.
6. The method of claim 1, wherein a new test set is created for each importing of a new data set in step S1.
CN202011433367.4A 2020-12-10 2020-12-10 Machine learning classification method based on self-adaptive error correction output coding Active CN112560920B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011433367.4A CN112560920B (en) 2020-12-10 2020-12-10 Machine learning classification method based on self-adaptive error correction output coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011433367.4A CN112560920B (en) 2020-12-10 2020-12-10 Machine learning classification method based on self-adaptive error correction output coding

Publications (2)

Publication Number Publication Date
CN112560920A true CN112560920A (en) 2021-03-26
CN112560920B CN112560920B (en) 2022-09-06

Family

ID=75060260

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011433367.4A Active CN112560920B (en) 2020-12-10 2020-12-10 Machine learning classification method based on self-adaptive error correction output coding

Country Status (1)

Country Link
CN (1) CN112560920B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0113263D0 (en) * 2001-05-31 2001-07-25 Univ Surrey Personal identity verification process system
CN104966105A (en) * 2015-07-13 2015-10-07 苏州大学 Robust machine error retrieving method and system
CN105955955A (en) * 2016-05-05 2016-09-21 东南大学 Disambiguation-free unsupervised part-of-speech tagging method based on error-correcting output codes
CN108562853A (en) * 2018-03-29 2018-09-21 上海交通大学 Method of Motor Fault Diagnosis based on error correcting output codes support vector machines and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0113263D0 (en) * 2001-05-31 2001-07-25 Univ Surrey Personal identity verification process system
CN104966105A (en) * 2015-07-13 2015-10-07 苏州大学 Robust machine error retrieving method and system
CN105955955A (en) * 2016-05-05 2016-09-21 东南大学 Disambiguation-free unsupervised part-of-speech tagging method based on error-correcting output codes
CN108562853A (en) * 2018-03-29 2018-09-21 上海交通大学 Method of Motor Fault Diagnosis based on error correcting output codes support vector machines and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
J.ZOU ET AL.: "A Dynamic Ensemble Selection Strategy for Improving Error Correcting Output Codes Algorithm", 《2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM)》 *
周斌斌等: "基于三元纠错输出编码的偏标记学习算法", 《计算机科学与探索》 *
钟天云: "基于迭代延长纠错输出编码的微阵列数据多分类方法", 《厦门大学学报(自然科学版)》 *
钟天云: "纠错输出编码方法的研究和应用", 《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑》 *

Also Published As

Publication number Publication date
CN112560920B (en) 2022-09-06

Similar Documents

Publication Publication Date Title
US11741361B2 (en) Machine learning-based network model building method and apparatus
CN109615014B (en) KL divergence optimization-based 3D object data classification system and method
Martinez-Munoz et al. Using boosting to prune bagging ensembles
CN109948149B (en) Text classification method and device
CN111914085A (en) Text fine-grained emotion classification method, system, device and storage medium
CN112905795A (en) Text intention classification method, device and readable medium
CN105706092A (en) Methods and systems of four-valued simulation
CN110019784B (en) Text classification method and device
CN114564586A (en) Unstructured sensitive data identification method and system
CN112560920B (en) Machine learning classification method based on self-adaptive error correction output coding
CN110032642B (en) Modeling method of manifold topic model based on word embedding
CN116467451A (en) Text classification method and device, storage medium and electronic equipment
Zhao et al. Improving continual relation extraction by distinguishing analogous semantics
CN113641823B (en) Text classification model training, text classification method, device, equipment and medium
CN106776600A (en) The method and device of text cluster
US11640531B2 (en) Method, apparatus and device for updating convolutional neural network using GPU cluster
CN115587231A (en) Data combination processing and rapid storage and retrieval method based on cloud computing platform
Zhou et al. Deep hashing with triplet quantization loss
CN114417095A (en) Data set partitioning method and device
CN111046934A (en) Method and device for identifying soft clauses of SWIFT message
US11699026B2 (en) Systems and methods for explainable and factual multi-document summarization
US20240111814A1 (en) Method and system for selecting samples to represent a cluster
CN116089731B (en) Online hash retrieval method and system for relieving catastrophic forgetting
CN116414974A (en) Short text classification method and device
CN111324737B (en) Bag-of-words model-based distributed text clustering method, storage medium and computing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Xiao Ziyang

Inventor after: Jiang Junxian

Inventor before: Xiao Ziyang

Inventor before: Jiang Junxian

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant