CN112560920A

CN112560920A - Machine learning classification method based on self-adaptive error correction output coding

Info

Publication number: CN112560920A
Application number: CN202011433367.4A
Authority: CN
Inventors: 肖子扬; 江均贤
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2020-12-10
Filing date: 2020-12-10
Publication date: 2021-03-26
Anticipated expiration: 2040-12-10
Also published as: CN112560920B

Abstract

The invention relates to a machine learning classification method based on self-adaptive error correction output coding, which comprises the following steps: s1, loading a data set required by a machine, creating a corresponding test set, S2, analyzing the data set, dividing the data set after analyzing and using a classification function, S3, extracting the divided data set, defining a data set interval to be 0-K, setting the required row number and column number to form a coding matrix, generating and obtaining a prior information set, S4, inputting a numerical value within K into the machine, forming corresponding parameters by the data, comparing the parameters with a predicted value in the prior information set to obtain an error value, S5, determining an adjusted gradient by the positive and negative values of the obtained error value, said gradient being fed back to the parameter, adjusting the input values, S6, classifying the results obtained after learning the different data sets, S7, deleting the values with error value not 0, and deleting the parameters and the classifier in the corresponding data set, and keeping the data set with the predicted value and the parameter value corresponding to each other.

Description

Machine learning classification method based on self-adaptive error correction output coding

Technical Field

The invention relates to a machine learning classification method, in particular to a machine learning classification method based on self-adaptive error correction output coding.

Background

The machine learning algorithm under the big data environment can ignore the importance degree of the learning result according to a certain performance standard. The division strategy is implemented in a distributed and parallel computing mode, so that interference caused by noise data and redundancy can be avoided, storage consumption is reduced, and the operation efficiency of a learning algorithm is improved.

The existing error correction output coding algorithm generates a coding matrix immediately for classification, the data interval is large, more irrelevant and fussy data exist in the data matrix formed after coding, a large amount of time is wasted to verify some data with extremely low probability when a machine learns, and a database generated after classification is too thick and complicated and increases the workload.

Disclosure of Invention

The invention provides a machine learning classification method based on self-adaptive error correction output coding, which can effectively solve the problems.

The invention is realized by the following steps:

a machine learning classification method based on adaptive error correction output coding comprises the following steps:

s1, loading the data set to be learned by the machine, creating the corresponding test set,

s2, analyzing the data set, dividing the data set into semantic class and panoramic class after analyzing and using classification function,

s3, extracting the segmented data set, defining the data set interval as 0, K, setting the needed row number and column number to form a coding matrix, generating to obtain a prior information set,

s4, inputting the value within K into the machine, forming corresponding parameters by the data, comparing with the predicted value in the prior information set to obtain an error value,

s5, determining the adjusted gradient through the positive and negative values of the obtained error value, feeding the gradient back to the parameter, adjusting the input value,

s6, constructing An M-class classifier, training N M classifiers, defining data in a data set as A1, A2 and A3 … … An, distinguishing the A1 from the A2 and A3 … … An by the first classifier, distinguishing the A2 from the A1 and the A3 … … An by the second classifier, classifying results obtained after different data sets are learned,

and S7, deleting the numerical value with the error value not being 0, deleting the parameters and the classifier in the corresponding data set, and keeping the data set with the predicted value and the corresponding parameter value.

As a further improvement, when the error value is a positive number, the gradient decreases, and when the error value is a negative number, the gradient increases.

As a further improvement, in step S2, the semantic class and the panorama class correspond to specifically:

semantic classes are data in the dataset labeled with its category,

the panorama class is to distinguish instance based on the semantic class.

As a further improvement, the values in the data set specifically include:

a first channel, to which the value belongs,

a second channel, the value belonging to the few instances in the data, expressing a number of instances inside,

the third channel, usually represented by 0 or 1, 0 represents single-bit data, and 1 represents multi-bit 1 data.

As a further refinement, the classification function is a train _ test _ split function.

As a further improvement, each time a new data set is imported in step S1, a new test set is created.

The invention has the beneficial effects that:

the invention divides and extracts information from a data set to form prior information, can generate a coding matrix adaptive to the data set according to the data of the prior information and the data with higher probability, forms a self-adaptive error correction output coding algorithm through the self-adaptive coding matrix, and classifies machine learning through the coding algorithm.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is a schematic flow diagram of the process.

FIG. 2 is a schematic flow chart of machine learning from a priori information set.

Detailed Description

The embodiments of the present invention are within the scope of the present invention. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

In the description of the present invention, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating the purposes, technical solutions and advantages of the embodiments of the present invention, which will be described in detail and fully with reference to the accompanying drawings in the embodiments of the present invention. All other implementations that can be derived by one of ordinary skill in the art based on the embodiments of the present invention show or imply relative importance or implicitly indicate the number of technical features indicated, without inventive step. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

Referring to fig. 1-2, a machine learning classification method based on adaptive error correction output coding,

comprises the following steps:

s1, loading a data set to be learned by the machine, creating a corresponding test set, eliminating the data of the test set before analyzing the data without introducing the rule of the test data, thereby influencing the selection of the algorithm, ensuring that the result of the test by adopting the test set is objective and credible without the problem of data perspective deviation,

s2, analyzing the data set by Numpy, Pandas and Matplotlib, dividing the data set into semantic class and panoramic class after analyzing and using classification function,

s3, extracting the segmented data set, defining the data set interval as 0, K, setting the needed row number and column number in the range of K, forming a coding matrix, generating and obtaining a prior information set, constructing the prior information set to greatly improve the learning and classification of useless information and improve the learning efficiency,

Further, when the error value is a positive number, the gradient is decreased, and when the error value is a negative number, the gradient is increased, so that the learned value can be adaptively adjusted and output until the machine can learn correct data and reject inappropriate data.

Further, in step S2, the semantic class and the panorama class correspond to specifically:

semantic classes are data in the dataset labeled with its category,

the panorama class is to distinguish instance based on the semantic class.

Further, the content of the numerical values in the data set specifically includes:

a first channel, to which the value belongs,

Further, the classification function is a train _ test _ split function, a random _ state parameter can achieve the function of setting a random generator seed, the seed can be transferred to a plurality of data sets with the same row number, and the data sets can be divided on the same index.

Further, each time a new data set is imported in step S1, a new test set needs to be created, otherwise each run will result in a different test set.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A machine learning classification method based on adaptive error correction output coding is characterized in that,

comprises the following steps:

s2, analyzing the data set, dividing the data set after analyzing the data set by using a classification function,

is divided into semantic class and panoramic class,

s3, extracting the segmented data set, defining the data set interval as (0, K), setting the required row number and column number to form a coding matrix, generating to obtain a prior information set,

2. The method of claim 1, wherein the gradient decreases when the error value is positive, and the gradient increases when the error value is negative.

3. The method of claim 1, wherein in step S2, the semantic class and the panorama class are specifically:

semantic classes are data in the dataset labeled with its category,

the panorama class is to distinguish instance based on the semantic class.

4. The method according to claim 1, wherein the values in the data set comprise contents specifically as follows:

a first channel, to which the value belongs,

5. The adaptive error correction output coding-based machine learning classification method according to claim 1, wherein the classification function is a train test split function.

6. The method of claim 1, wherein a new test set is created for each importing of a new data set in step S1.