CN110163252A - Data classification method and device, electronic equipment, storage medium - Google Patents

Data classification method and device, electronic equipment, storage medium Download PDF

Info

Publication number
CN110163252A
CN110163252A CN201910309546.8A CN201910309546A CN110163252A CN 110163252 A CN110163252 A CN 110163252A CN 201910309546 A CN201910309546 A CN 201910309546A CN 110163252 A CN110163252 A CN 110163252A
Authority
CN
China
Prior art keywords
data
prediction model
data classification
sample
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910309546.8A
Other languages
Chinese (zh)
Other versions
CN110163252B (en
Inventor
李正洋
张亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910309546.8A priority Critical patent/CN110163252B/en
Publication of CN110163252A publication Critical patent/CN110163252A/en
Application granted granted Critical
Publication of CN110163252B publication Critical patent/CN110163252B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses a kind of data classification method and devices, are related to machine learning techniques field.The described method includes: being based on provided training set, at least two base learners for being used for prediction data class label are generated, is combined by the base learner and forms data classification prediction model;The data classification prediction model is trained according to the training set, obtains Prediction Parameters associated with the data classification prediction model;According to acquired Prediction Parameters, is predicted by the class label that the data classification prediction model carries out data sample, obtain the class label of the data sample.Accurate Prediction can be carried out to the class label of data sample using method provided by the present application.

Description

Data classification method and device, electronic equipment, storage medium
Technical field
This application involves machine learning techniques field, in particular to a kind of data classification method and device, electronic equipment, meter Calculation machine readable storage medium storing program for executing.
Background technique
During carrying out data classification, machine learning is provided for important technical support.Carrying out supervision Machine learning when, target be learn out one it is stable and show all preferable model in all respects, but can only actually obtain Preferable model is showed to obtain in some aspects to multiple.In order to solve this technical problem, the prior art is proposed multiple machines Device learning model is combined into the integrated learning approach of a prediction model, even if some machine learning model has obtained mistake Prediction, other machine learning models can also return error correcting.
But since prediction data may be continually changing, and the learning effect of different machines learning model is respectively not Identical, the two factors can all influence prediction model to the precision of prediction of prediction data, so as to cause cannot to prediction data into Row Accurate classification.
Summary of the invention
Based on above-mentioned technical problem, this application provides a kind of data classification methods and device, electronic equipment, computer can Read storage medium.
Techniques disclosed in this application scheme includes:
A kind of data classification method, which comprises be based on provided training set, generate at least two for predicting The base learner of data category label, and at least two base learners are combined to form data classification prediction model, it is each The base learner is the combination obtained respectively for the training of different historical data samples;According to the training set to the data Classification prediction model is trained, and obtains Prediction Parameters associated with the data classification prediction model, the Prediction Parameters First feature learning weight and first feature learning parameter including the relatively described training set of the data prediction model;According to acquired Prediction Parameters, by the data classification prediction model carry out data sample class label predict, obtain the data sample This class label.
Further, described that the data classification prediction model is trained according to the training set, obtain with it is described The associated Prediction Parameters of data classification prediction model, comprising: the training set is carried out by the data classification prediction model In each training sample class label prediction, obtain it is associated with the data classification prediction model member eigenmatrix;According to Member eigenmatrix obtained, calculates first feature learning weight of the relatively described training set of the data classification prediction model, with And obtain first feature learning parameter of the relatively described training set of data classification prediction model.
Further, described that each training sample classification mark in the training set is carried out by the data classification prediction model The prediction of label obtains first eigenmatrix associated with the data classification prediction model, comprising: pre- according to the data classification Model is surveyed to the prediction deviation of each training sample class label, constructs the residual error space of the training set;According to each training sample The prediction effect of class label carries out soft clustering processing to constructed residual error space, obtains and data classification prediction model phase Associated member eigenmatrix.
Further, described according to first eigenmatrix obtained, it is relatively described to calculate the data classification prediction model First feature learning weight of training set, comprising: obtain the anticipation function of the data classification prediction model, the anticipation function institute First feature comprising training sample is from first eigenmatrix;By by the target of the anticipation function and each training sample Function least square obtains first feature learning weight of the relatively described training set of the data classification prediction model.
Further, described according to first eigenmatrix obtained, it is relatively described to obtain the data classification prediction model First feature learning parameter of training set, comprising: according to designated groups number, the training set is replicated to obtain the instruction of the designated groups number Practice collection;In each group of training set, each training sample is predicted that resulting class label is corresponding with first eigenmatrix First feature is replaced, using the data set of obtained designated groups number as first feature learning parameter.
Further, described according to acquired Prediction Parameters, data sample is carried out by the data classification prediction model This class label prediction, obtains the class label of the data sample, comprising: by the initial characteristic data of the data sample Linear regression calculating is carried out with first feature learning parameter, obtains the relatively described data classification prediction model of the data sample First feature;By the first feature and first feature learning weight of the data sample, the data classification prediction mould is calculated First feature weight of the relatively described data sample of type;According to first feature weight obtained, mould is predicted by the data classification Type carries out the prediction of the data sample class label.
Further, by first feature of the data sample and first feature learning weight, the data point are calculated The formula of first feature weight of the relatively described data sample of class prediction model is expressed as Institute Data classification prediction model is stated according to first feature weight, the calculation formula predicted the data sample class label It is expressed asWherein, " x*" indicate the data sample, described in " T " expression The group number of first feature learning parameter, " mj(x*) " indicate first feature of the data sample, " vij" indicate the data sample First feature weight, " Q " indicate the quantity of preset class label in the data classification prediction model,Indicate every For the one base learner to the prediction result of the data sample, " S " indicates the quantity of the base learner.
A kind of device for classifying data, described device includes: prediction model building module, for being based on provided training Collection generates at least two base learners for being used for prediction data class label, and at least two base learners combines shape At data classification prediction model, each base learner is the combination obtained respectively for the training of different historical data samples; Prediction model training module, for being trained according to the training set to the data classification prediction model, obtain with it is described The associated Prediction Parameters of data classification prediction model, the Prediction Parameters include the relatively described training of the data prediction model First feature learning weight of collection and first feature learning parameter;Data sample prediction module, for according to acquired Prediction Parameters, The class label prediction that data sample is carried out by the data classification prediction model, obtains the classification mark of the data sample Label.
A kind of electronic equipment, the electronic equipment include:
Processor;
Memory is stored with computer-readable instruction on the memory, and the computer-readable instruction is by the processing When device executes, foregoing data classification method is realized.
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor When row, foregoing data classification method is realized.
The technical solution that embodiments herein provides can include the following benefits:
In the above-mentioned technical solutions, the result and data classification prediction model of class label prediction are carried out to data sample Prediction Parameters are related, and the Prediction Parameters of data classification prediction model are again relied on through several test samples pair in test set The training process of data classification prediction model.In this application, the Prediction Parameters of data classification prediction model include data prediction Model can embody data with respect to first feature learning weight of training set and first feature learning parameter, these Prediction Parameters well Classify different base learners in prediction model predictive ability to the prediction weight of precision of prediction so that data classification prediction model Accurate Prediction can be carried out to the class label prediction of data sample in conjunction with the prediction weight of different base learners.
It should be understood that the above general description and the following detailed description are merely exemplary, this can not be limited Application.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the application Example, and in specification it is used to explain the principle of the application together.
Fig. 1 is the schematic diagram of implementation environment of the present invention shown according to an exemplary embodiment;
Fig. 2 is a kind of hardware block diagram of server shown according to an exemplary embodiment;
Fig. 3 is a kind of flow chart of data classification method shown according to an exemplary embodiment;
Fig. 4 is according to the flow chart that step 330 is described shown in Fig. 3 corresponding embodiment;
Fig. 5 to Fig. 8 is the schematic diagram of involved data handling procedure in embodiment illustrated in fig. 4;
Fig. 9 is according to the flow chart that step 350 is described shown in Fig. 3 corresponding embodiment;
Figure 10 is a kind of block diagram of device for classifying data shown according to an exemplary embodiment.
Through the above attached drawings, it has been shown that the specific embodiment of the present invention will be hereinafter described in more detail, these attached drawings It is not intended to limit the scope of the inventive concept in any manner with verbal description, but is by referring to specific embodiments Those skilled in the art illustrate idea of the invention.
Specific embodiment
Here will the description is performed on the exemplary embodiment in detail, the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the application.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the application.
Fig. 1 is a kind of schematic diagram of implementation environment shown according to an exemplary embodiment.As shown in Figure 1, of the invention Implementation environment includes: terminal 100 and server 200.
In the present invention, terminal 100 is for operation data classification client, alternatively referred to as front end, for obtaining data sample Originally and the prediction result of data sample class label is shown.Terminal 100 specifically can be smart phone, plate electricity Brain, laptop, computer or any other electronic equipment for capableing of operation data classification client, are not limited herein It is fixed.
Server 200, to respond the service request of terminal initiation, carries out then for storing mass data according to service request Data processing.Server 200 can be a server, or the server cluster being made of several servers, this place Without limiting.
It should be noted that wired or wireless network connection is pre-established between terminal 100 and server 200, from And terminal 100 is enabled to carry out data interaction with server 200.
Fig. 2 is the hardware block diagram of server 200 shown in a kind of Fig. 1 shown according to an exemplary embodiment.It needs Illustrate, which is one and adapts to example of the invention, must not believe that there is provided use model to of the invention Any restrictions enclosed.
The hardware configuration of the server can generate biggish difference due to the difference of configuration or performance, as shown in Fig. 2, clothes Business device 200 includes: power supply 210, interface 230, at least a memory 250 and an at least central processing unit 270.
Wherein, power supply 210 is used to provide operating voltage for each hardware device on server 200.
Interface 230 includes an at least wired or wireless network interface 231, at least a string and translation interface 233, at least one defeated Enter output interface 235 and at least USB interface 237 etc., is used for and external device communication.
The carrier that memory 250 is stored as resource, can be read-only memory, random access memory, disk or CD Deng the resource stored thereon includes operating system 251, application program 253 or data 255 etc., and storage mode can be short Temporary storage permanently stores.Wherein, operating system 251 be used for manage and control server 200 on each hardware device and Application program 253, to realize calculating and processing of the central processing unit 270 to mass data 255.Application program 253 is based on behaviour Make the computer program for completing at least one particular job on system 251, may include that an at least module (is not shown in Fig. 2 Out), each module can separately include the series of computation machine readable instruction to server 200.Data 255, which can be, deposits The key message etc. being stored in disk.
Central processing unit 270 may include the processor of one or more or more, and be set as through bus and memory 250 communications, for the mass data 255 in operation and processing memory 250.
Described in detail above, be applicable in server of the invention will be by storing in the reading storage of central processing unit 270 The form of series of computation machine readable instruction complete data classification method.
Fig. 3 is a kind of flow chart of data classification method shown according to an exemplary embodiment, and this method is suitable for Fig. 1 Shown in server 200.As shown in figure 3, this method may comprise steps of:
Step 310, it is based on provided training set, generates at least two base study for prediction data class label Device, and at least two base learners are combined to form data classification prediction model.
Wherein, training set is the set being made of several training samples, which should be understood as to data The data sample that classification prediction model is trained.In one exemplary embodiment, training sample specifically can be text data.
Base learner is the prediction model for prediction data sample class label, it will be appreciated that is predicted for a kind of class label Rule function.Wherein, different base learners can be for different historical data sample specialized trainings come out so that The prediction effect of different base learners is different.The historical data sample is understood as carrying out the training of base learner Data sample, this is that common technological means, this place do not repeat in integrated study.
Based on provided training set, it will be assumed that generate S base learner < g1(x), g2(x) ..., gS(x) >, and by this A little base learner combinations constitute data classification prediction model.In other words, data classification prediction model is that had by least two The base learner of different prediction effects, which stacks, to be formed, which should be understood as a kind of integrated study mould Type.
Step 330, data classification prediction model is trained according to training set, is obtained and data classification prediction model phase Associated Prediction Parameters.
Wherein, process data classification prediction model being trained according to training set, it should be appreciated that need to make respectively With each base learner in data classification prediction model, the prediction of class label is carried out to each training sample in training set, To obtain corresponding prediction result, and prediction result obtained is integrated according to certain rule, to finally obtain every The class label of one training sample.
It is easy to understand ground, for example, to training sample x, aforementioned S base learner generated need to be used to carry out respectively Class label prediction predicts acquired results < y to the class label of training sample x respectively so as to obtain each base learner1, y2..., yS>, it then needs by being integrated to this prediction result, the final class label for obtaining training sample x.
Prediction Parameters associated with data classification prediction model are that data classification prediction model is instructed for training set Practice resulting relevant parameter.In one exemplary embodiment, which may include first feature learning weight and first characterology Practise parameter, wherein first feature learning weight indicates: each base learner is respectively to the pre- of training sample in data classification prediction model The weights of importance as a result, the final gained prediction result of the opposite training sample is surveyed, first feature learning parameter indicates: for extracting The learning parameter of first feature of data sample, first feature of data sample are to predict mould in data classification for describing data sample Input feature vector in type.
Alternatively, in other exemplary embodiments, Prediction Parameters can also include other parameters, it is not offered as the application couple The concrete type of the Prediction Parameters has carried out any restrictions.
It should be noted that Prediction Parameters relevant to data classification prediction model, be that data classification prediction model is directed to After the provided training sample of training set is trained, by being obtained after carrying out respective handling to training acquired results.
Prediction Parameters include the first feature learning weight and first feature learning ginseng of the relatively described training set of data prediction model Number, as shown in figure 4, in one exemplary embodiment, obtaining Prediction Parameters associated with data classification prediction model can wrap Include following steps:
Step 331, it is trained the prediction for concentrating each training sample class label by data classification prediction model, obtains Member eigenmatrix associated with data classification prediction model.
Wherein, by each base learner in data classification prediction model respectively to the class label of each training sample into Row prediction, obtain corresponding prediction result, then combined according to these prediction results to be formed it is related to data classification prediction model First eigenmatrix of connection.First eigenmatrix is defeated in data classification prediction model for describing each training sample in training set Enter feature.
Specifically he, needs inclined according to prediction of the base learner in data classification prediction model to each training sample class label Difference constructs the residual error space of the training set, then the prediction according to data classification prediction model to each training sample class label Effect carries out soft clustering processing to constructed residual error space, obtains first feature square associated with data classification prediction model Battle array.
Prediction deviation of each training sample on different base learners it should be appreciated that the training sample true classification Label predicts the difference between gained class label from different base learners respectively.If resulting difference is larger, corresponding base is indicated Learner is poor to the prediction effect of the training sample;If resulting difference is smaller, then it represents that corresponding base learner is to the training The prediction effect of sample is preferable.
In one exemplary embodiment, as shown in fig. 5, it is assumed that residual error of the training sample x on different base learners is (i.e. pre- Survey deviation) it is < r1, r2..., rS>, then for the training set D containing N number of training sample x, residual error space such as Fig. 6 of generation Shown, residual error space then indicates that data classification prediction model carries out the prediction effect of class label to training sample each in training set Quality.
Behind the residual error space for constructing training set, soft clustering processing can be carried out to residual error space by fuzzy C-mean algorithm method, To be directed to the prediction effect of different training samples according to certain specific base learners, the different training samples in training set are assembled Together, if forming Ganlei's cluster.That is, based on the similitude between different training samples, if training set is divided into Ganlei's cluster.
If assuming, the quantity of divided class cluster is T, by above-mentioned fuzzy C-mean algorithm method can be obtained each training sample with respectively Degree of membership between class cluster, the degree of membership are each training sample first feature associated with data classification prediction model, tool Body can be indicated by first characteristic function, and the original spy of each training sample to the training sample is mapped by first characteristic function Sign.For example, the degree of membership between training sample x and inhomogeneity cluster may be expressed as: f with first characteristic function1(x), f2(x), f3 (x) ..., fT(x).The primitive character of training sample then should be understood as the data content that the training sample is included, for example, working as When training set is text set, the primitive character of each training sample mutually should be specific content of text.
And for entire training set, as shown in fig. 7, special by each training sample member associated with data classification prediction model The combination of sign obtains training set first eigenmatrix associated with data classification prediction model.Each training sample contains as a result, There are T first feature fs associated with data classification prediction modelj(x)。
Step 333, according to first eigenmatrix obtained, first feature of the data classification prediction model with respect to training set is calculated Learn weight, and obtains first feature learning parameter of the data classification prediction model with respect to training set.
As previously mentioned, data classification prediction model is indicated with respect to first feature learning weight of training set: data classification is predicted Each base learner is respectively to the prediction result of training sample each in training set, the final gained prediction knot of shared training sample in model The weights of importance of fruit.According to first feature learning weight it can be concluded that, at least two integrated in data classification prediction model In base learner, the accuracy which base learner predicts training sample is higher, then the weight of these base learners is also more Greatly.
First feature learning parameter then indicates therefore the learning parameter of first feature for extracting data sample need to obtain in advance The first feature learning parameter for obtaining data classification prediction model could make data classification prediction model carry out class to data sample When distinguishing label is predicted, first feature of data sample can be obtained according to first feature learning parameter.
In one exemplary embodiment, the first feature learning weight for calculating data classification prediction model may include following step It is rapid:
Obtain the anticipation function of the data classification prediction model;
By the way that it is opposite to obtain data classification prediction model by the objective function least square of anticipation function and each training sample First feature learning weight of training set.
Wherein, after the first eigenmatrix for obtaining data classification prediction model, FWLS (Feature can be used Weighted Linear Stacking, characteristic weighing linear superposition) algorithm Computing Meta feature learning weight.Firstly, needing to obtain The anticipation function of data classification prediction model, the anticipation function are the groups searched based on standard linear regression stacking process Function is closed, the anticipation function is specific as follows:
Wherein, b (x) indicates the anticipation function of data classification prediction model, vijIt indicates that data classification prediction model is opposite to instruct Practice first feature learning weight of collection, fj(x) first feature of training sample x, g are indicatedi(x) it indicates to carry out classification mark to training sample x Sign the base learner of prediction.
As previously described, training set first eigenmatrix associated with data classification prediction model be by each training sample with The associated first feature of data classification prediction model combines, therefore first feature of training sample x derives from this yuan of feature Matrix, and it is corresponding with the Partial Elements in this yuan of eigenmatrix.
Then, using the optimization method of FLWS algorithm, by sort data into prediction model anticipation function and each training The objective function of sample carries out least-squares calculation, to carry out first feature learning weight vijSeek, it is obtained member feature Learn weight vijShould carry out calculating resulting minimum value according to following expression.The expression formula is as follows:
Wherein, y (x) indicates the objective function of training sample x.First feature learning weight vijIndicate that i-th of base learner exists The information such as predictive information, confidence level in training sample data, for a trained data classification prediction model, member Feature learning weight vijIt is a constant, therefore, by first characteristic function fj(x) determine i-th of base learner in the training sample On weight.
It should be understood that above-mentioned Computing Meta feature learning weight vijThe process nature of minimum value is predicted data classification The process of each base learner arameter optimization in model, as the first feature learning weight v of acquisitionijWhen minimum value, the data classification is indicated Prediction model is trained to, first feature learning weight v of the data classification prediction modelijIt is then a constant.
In one exemplary embodiment, the first feature learning parameter for obtaining data classification prediction model may include following step It is rapid:
According to designated groups number, training set is replicated to obtain the training set of designated groups number;
In each group of training set, each training sample is predicted into resulting class label and corresponding member in first eigenmatrix Feature is replaced, using the data set of obtained designated groups number as first feature learning parameter.
Wherein, designated groups number is preset in data classification prediction model, the quantity (i.e. T group) with aforementioned divided class cluster It is consistent.
By the way that training set is copied as T group, training sample each in each training set is predicted that resulting class label is distinguished Member characteristic function corresponding with first eigenmatrix is replaced, as shown in figure 8, therefore available T data set, is obtained T data set be first feature learning parameter of the data classification prediction model with respect to training set.
Step 350, according to acquired Prediction Parameters, the classification mark of data sample is carried out by data classification prediction model Label prediction, obtains the class label of data sample.
Wherein, when the class label for being carried out data sample using trained data classification prediction model is predicted, need to make It is carried out with the resulting Prediction Parameters of data classification prediction model training.
As shown in figure 9, in one exemplary embodiment, the classification mark of data sample is carried out by data classification prediction model Label prediction may include steps of:
Step 351, by the primitive character of data sample and first feature learning parameter linear regression, it is opposite to obtain data sample First feature of data classification prediction model.
As previously mentioned, the primitive character of data sample indicates the included data content of the data sample, such as text data. Data sample can be set as x*, it is x by the data sample*Linear regression (Linear is carried out with the resulting T data set of training Regression, abbreviation LR), and obtain test sample x*First feature, be represented by mj(x*)。
First feature of data sample is learnt during class label of multiple base learners to training set is predicted as a result, It is got on the basis of parameters obtained, can more accurately indicate that input of the data sample in data classification prediction model is special Sign, to provide basis for Accurate Prediction of the data classification prediction model to data sample class label.
Step 353, by first feature of data sample and first feature learning weight, it is opposite to calculate data classification prediction model First feature weight of data sample.
Wherein, first feature learning weight of the combined data classification prediction model acquired in the training stage, calculating pass through number During carrying out the prediction of data sample class label according to classification prediction model, weights of importance corresponding to each base learner, Acquired results are first feature weight of data classification prediction model relative data sample, and calculation formula is as follows:
Step 355, according to first feature weight obtained, data sample classification mark is carried out by data classification prediction model The prediction of label.
Wherein, wherein after the first feature weight for obtaining data classification prediction model relative data sample, first feature is weighed It brings into the anticipation function of data classification prediction model and is calculated with first feature of data sample again, to pass through data classification Prediction model carries out the class label prediction of data sample.
Specifically, in one exemplary embodiment, for data sample x*, data classification prediction model needs pre- from its institute If class label set { l1, l2, l3..., lQIn predict a class label, calculation formula is as follows:
Wherein, need for tag along sort to be preset as a Q dimensional vector Wherein,Indicate base learner giIn class label lQOn output.
As can be seen that first feature weight of data classification prediction model relative data sample, first feature with data sample Correlation, it is thus understood that be, in data classification method provided herein, each base study in data classification prediction model Device depends on first feature of the data sample to the weights of importance of data sample, and first feature of the data sample includes What each base learner learnt during carrying out precision of prediction and training, so that data classification provided herein Prediction model can carry out Accurate Prediction to the class label of data sample.
Figure 10 is a kind of device for classifying data according to shown by an exemplary embodiment.As shown in Figure 10, the device packet Include prediction model building module 410, prediction model training module 430 and data sample predictions module 450.
Prediction model constructs module 410 and is used to be based on provided training set, generates at least two and is used for prediction data class At least two base learners are combined to form data classification prediction model, each base by the base learner of distinguishing label Device is practised to obtain for the training of different historical data samples respectively.
Prediction model training module 430 is used to be trained the data classification prediction model according to the training set, Prediction Parameters associated with the data classification prediction model are obtained, the Prediction Parameters include the data prediction model phase First feature learning weight and first feature learning parameter to the training set.
Data sample prediction module 450 is used to pass through the data classification prediction model according to acquired Prediction Parameters The class label prediction for carrying out data sample, obtains the class label of the data sample.
In another embodiment, prediction model training module 430 specifically includes training sample predicting unit and Prediction Parameters Acquiring unit.
Training sample predicting unit is used to carry out each trained sample in the training set by the data classification prediction model The prediction of this class label obtains first eigenmatrix associated with the data classification prediction model.
Prediction Parameters acquiring unit is used to calculate the data classification prediction model phase according to first eigenmatrix obtained To first feature learning weight of the training set, and obtain the member spy of the relatively described training set of data classification prediction model Levy learning parameter.
In another embodiment, training sample predicting unit specifically includes at residual error space building subelement and residual error space Manage subelement.
Residual error space constructs subelement and is used for according to the data classification prediction model to each training sample class label Prediction deviation constructs the residual error space of the training set.
Residual error spatial manipulation subelement is used for the prediction effect according to each training sample class label, to constructed residual error Space carries out soft clustering processing, obtains first eigenmatrix associated with data classification prediction model.
In another embodiment, Prediction Parameters acquiring unit specifically includes function and obtains subelement and function processing son list Member.
Function obtains the anticipation function that subelement is used to obtain the data classification prediction model, and the anticipation function is wrapped First feature containing training sample is from first eigenmatrix.
Function handles subelement and is used for by obtaining the objective function least square of the anticipation function and each training sample Obtain first feature learning weight of the relatively described training set of data classification prediction model.
In another embodiment, Prediction Parameters acquiring unit specifically includes data duplication subelement and data replacement is single Member.
Data replicate subelement and are used to replicate the training set to obtain the training of the designated groups number according to designated groups number Collection;
Data are replaced subelement and are used in each group of training set, and each training sample is predicted resulting class label and institute It states corresponding member feature in first eigenmatrix to be replaced, using the data set of obtained designated groups number as first feature learning Parameter.
In another embodiment, data sample prediction module 450 specifically includes first feature acquiring unit, first feature weight meter Calculate unit and class label predicting unit.
First feature acquiring unit be used for by the initial characteristic data of the data sample and first feature learning parameter into Row linear regression calculates, and obtains first feature of the relatively described data classification prediction model of the data sample.
First feature weight computing unit is used for first feature and first feature learning weight by the data sample, meter Calculate first feature weight of the relatively described data sample of the data classification prediction model.
Class label predicting unit is used for according to first feature weight obtained, by the data classification prediction model into The prediction of the row data sample class label.
It should be noted that method provided by device provided by above-described embodiment and above-described embodiment belongs to same structure Think, the concrete mode that wherein modules execute operation is described in detail in embodiment of the method, no longer superfluous herein It states.
In one exemplary embodiment, the application also provides a kind of electronic equipment, which includes:
Processor;
Memory is stored with computer-readable instruction on the memory, when which is executed by processor, Realize data classification method as previously shown.
In one exemplary embodiment, the application also provides a kind of computer readable storage medium, is stored thereon with calculating Machine program when the computer program is executed by processor, realizes data classification method as previously shown.
It should be understood that the application is not limited to the precise structure that has been described above and shown in the drawings, and And various modifications and change can executed without departing from the scope.Scope of the present application is only limited by the accompanying claims.

Claims (10)

1. a kind of data classification method, which is characterized in that the described method includes:
Based on provided training set, at least two base learners for being used for prediction data class label are generated, and will at least two A base learner combines to form data classification prediction model, and each base learner is respectively for different historical datas What sample training obtained;
The data classification prediction model is trained according to the training set, is obtained and the data classification prediction model phase Associated Prediction Parameters, the Prediction Parameters include first feature learning weight of the relatively described training set of the data prediction model With first feature learning parameter;
According to acquired Prediction Parameters, predicted by the class label that the data classification prediction model carries out data sample, Obtain the class label of the data sample.
2. the method according to claim 1, wherein described predict the data classification according to the training set Model is trained, and obtains Prediction Parameters associated with the data classification prediction model, comprising:
The prediction of each training sample class label in the training set, acquisition and institute are carried out by the data classification prediction model State the associated first eigenmatrix of data classification prediction model;
According to first eigenmatrix obtained, first feature learning of the relatively described training set of the data classification prediction model is calculated Weight, and obtain first feature learning parameter of the relatively described training set of data classification prediction model.
3. according to the method described in claim 2, it is characterized in that, described by described in data classification prediction model progress The prediction of each training sample class label in training set obtains first feature square associated with the data classification prediction model Battle array, comprising:
According to the data classification prediction model to the prediction deviation of each training sample class label, the residual of the training set is constructed Difference space;
According to the prediction effect of each training sample class label, soft clustering processing is carried out to constructed residual error space, obtain with The associated first eigenmatrix of data classification prediction model.
4. according to the method described in claim 2, it is characterized in that, described according to first eigenmatrix obtained, described in calculating First feature learning weight of the relatively described training set of data classification prediction model, comprising:
Obtain the anticipation function of the data classification prediction model, first feature source of the included training sample of anticipation function In first eigenmatrix;
By obtaining the data classification prediction model for the objective function least square of the anticipation function and each training sample First feature learning weight of the relatively described training set.
5. according to the method described in claim 2, it is characterized in that, described according to first eigenmatrix obtained, described in acquisition First feature learning parameter of the relatively described training set of data classification prediction model, comprising:
According to designated groups number, the training set is replicated to obtain the training set of the designated groups number;
In each group of training set, each training sample is predicted into resulting class label and corresponding member in first eigenmatrix Feature is replaced, using the data set of obtained designated groups number as first feature learning parameter.
6. according to the method described in claim 2, passing through the number it is characterized in that, described according to acquired Prediction Parameters The class label prediction that data sample is carried out according to classification prediction model, obtains the class label of the data sample, comprising:
The initial characteristic data of the data sample and first feature learning parameter are subjected to linear regression calculating, described in acquisition First feature of the relatively described data classification prediction model of data sample;
By the first feature and first feature learning weight of the data sample, it is opposite to calculate the data classification prediction model First feature weight of the data sample;
According to first feature weight obtained, the data sample class label is carried out by the data classification prediction model Prediction.
7. according to the method described in claim 6, it is characterized in that, the first feature and first feature that pass through the data sample Learn weight, the formula for calculating first feature weight of the relatively described data sample of the data classification prediction model is expressed as
The data classification prediction model is according to first feature weight, the meter predicted the data sample class label Formula is calculated to be expressed as
Wherein, " x*" indicate that the data sample, " T " indicate the group number of first feature learning parameter, " mj(x*) " indicate described in First feature of data sample, " vij" indicate first feature weight of the data sample, " Q " indicates the data classification prediction model In preset class label quantity,Indicate each base learner to the prediction knot of the data sample Fruit, " S " indicate the quantity of the base learner.
8. a kind of device for classifying data, which is characterized in that described device includes:
Prediction model constructs module, for being based on provided training set, generates at least two and is used for prediction data class label Base learner, and at least two base learners are combined to form data classification prediction model, each base learner It is obtained respectively for the training of different historical data samples;
Prediction model training module, for being trained according to the training set to the data classification prediction model, obtain with The associated Prediction Parameters of data classification prediction model, the Prediction Parameters include that the data prediction model is relatively described First feature learning weight of training set and first feature learning parameter;
Data sample prediction module, for being counted by the data classification prediction model according to acquired Prediction Parameters It is predicted according to the class label of sample, obtains the class label of the data sample.
9. a kind of electronic equipment, which is characterized in that the equipment includes:
Processor;
Memory is stored with computer-readable instruction on the memory, and the computer-readable instruction is held by the processor When row, data classification method as described in any one of claim 1 to 7 is realized.
10. a kind of computer readable storage medium, which is characterized in that be stored thereon with computer program, the computer program When being executed by processor, data classification method as described in any one of claim 1 to 7 is realized.
CN201910309546.8A 2019-04-17 2019-04-17 Data classification method and device, electronic equipment and storage medium Active CN110163252B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910309546.8A CN110163252B (en) 2019-04-17 2019-04-17 Data classification method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910309546.8A CN110163252B (en) 2019-04-17 2019-04-17 Data classification method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110163252A true CN110163252A (en) 2019-08-23
CN110163252B CN110163252B (en) 2023-11-24

Family

ID=67638590

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910309546.8A Active CN110163252B (en) 2019-04-17 2019-04-17 Data classification method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110163252B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929785A (en) * 2019-11-21 2020-03-27 中国科学院深圳先进技术研究院 Data classification method and device, terminal equipment and readable storage medium
CN111126448A (en) * 2019-11-29 2020-05-08 无线生活(北京)信息技术有限公司 Method and device for intelligently identifying fraud users
WO2021059081A1 (en) * 2019-09-25 2021-04-01 International Business Machines Corporation Systems and methods for training a model using a few-shot classification process
CN112825576A (en) * 2019-11-20 2021-05-21 中国电信股份有限公司 Method and device for determining cell capacity expansion and storage medium
WO2022141094A1 (en) * 2020-12-29 2022-07-07 深圳市大疆创新科技有限公司 Model generation method and apparatus, image processing method and apparatus, and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6453307B1 (en) * 1998-03-03 2002-09-17 At&T Corp. Method and apparatus for multi-class, multi-label information categorization
CN104881689A (en) * 2015-06-17 2015-09-02 苏州大学张家港工业技术研究院 Method and system for multi-label active learning classification
CN105354595A (en) * 2015-10-30 2016-02-24 苏州大学 Robust visual image classification method and system
CN106446951A (en) * 2016-09-28 2017-02-22 中科院成都信息技术股份有限公司 Singular value selection-based integrated learning device
CN108710907A (en) * 2018-05-15 2018-10-26 苏州大学 Handwritten form data classification method, model training method, device, equipment and medium
CN109471938A (en) * 2018-10-11 2019-03-15 平安科技(深圳)有限公司 A kind of file classification method and terminal

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6453307B1 (en) * 1998-03-03 2002-09-17 At&T Corp. Method and apparatus for multi-class, multi-label information categorization
CN104881689A (en) * 2015-06-17 2015-09-02 苏州大学张家港工业技术研究院 Method and system for multi-label active learning classification
CN105354595A (en) * 2015-10-30 2016-02-24 苏州大学 Robust visual image classification method and system
CN106446951A (en) * 2016-09-28 2017-02-22 中科院成都信息技术股份有限公司 Singular value selection-based integrated learning device
CN108710907A (en) * 2018-05-15 2018-10-26 苏州大学 Handwritten form data classification method, model training method, device, equipment and medium
CN109471938A (en) * 2018-10-11 2019-03-15 平安科技(深圳)有限公司 A kind of file classification method and terminal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
闵帆 等: "SUCE:基于聚类集成的半监督二分类方法", 智能系统学报, vol. 13, no. 06, pages 974 - 980 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021059081A1 (en) * 2019-09-25 2021-04-01 International Business Machines Corporation Systems and methods for training a model using a few-shot classification process
CN112825576A (en) * 2019-11-20 2021-05-21 中国电信股份有限公司 Method and device for determining cell capacity expansion and storage medium
CN112825576B (en) * 2019-11-20 2023-05-05 中国电信股份有限公司 Cell capacity expansion determining method, device and storage medium
CN110929785A (en) * 2019-11-21 2020-03-27 中国科学院深圳先进技术研究院 Data classification method and device, terminal equipment and readable storage medium
CN110929785B (en) * 2019-11-21 2023-12-05 中国科学院深圳先进技术研究院 Data classification method, device, terminal equipment and readable storage medium
CN111126448A (en) * 2019-11-29 2020-05-08 无线生活(北京)信息技术有限公司 Method and device for intelligently identifying fraud users
WO2022141094A1 (en) * 2020-12-29 2022-07-07 深圳市大疆创新科技有限公司 Model generation method and apparatus, image processing method and apparatus, and readable storage medium

Also Published As

Publication number Publication date
CN110163252B (en) 2023-11-24

Similar Documents

Publication Publication Date Title
CN110163252A (en) Data classification method and device, electronic equipment, storage medium
Duan et al. Test‐Sheet Composition Using Analytic Hierarchy Process and Hybrid Metaheuristic Algorithm TS/BBO
CN108171280A (en) A kind of grader construction method and the method for prediction classification
CN103605711B (en) Construction method and device, classification method and device of support vector machine
CN105184368A (en) Distributed extreme learning machine optimization integrated framework system and method
Blecic et al. How much past to see the future: a computational study in calibrating urban cellular automata
CN110222838A (en) Deep neural network and its training method, device, electronic equipment and storage medium
Yu et al. Melo: Enhancing model editing with neuron-indexed dynamic lora
Sun Predictive analysis and simulation of college sports performance fused with adaptive federated deep learning algorithm
He et al. A modified artificial bee colony algorithm based on search space division and disruptive selection strategy
Spychalski et al. Machine learning in multi-agent systems using associative arrays
AlKindy et al. Hybrid genetic algorithm and lasso test approach for inferring well supported phylogenetic trees based on subsets of chloroplastic core genes
CN110135626A (en) Credit management method and device, electronic equipment, storage medium
Wang et al. Hybrid gray wolf optimization and cuckoo search algorithm based on the taguchi theory
Uher et al. Utilization of the discrete differential evolution for optimization in multidimensional point clouds
CN114328221A (en) Cross-project software defect prediction method and system based on feature and instance migration
CN113191527A (en) Prediction method and device for population prediction based on prediction model
Zhao et al. Terminal replacement prediction based on deep belief networks
Ming-Zhu et al. The research and implementation of technology of generating test paper based on genetic algorithm
Wei et al. Multi-strategy synergy-based backtracking search optimization algorithm
Hu et al. An intelligent test paper generation method based on genetic particle swarm optimization
Raška et al. Methodology for evaluating optimization experiments
Luo Simulation Experiment Exploration of Genetic Algorithm’s Convergence over the Relationship Advantage Problem
Hsieh et al. CMAIS‐WOA: An Improved WOA with Chaotic Mapping and Adaptive Iterative Strategy
Xie [Retracted] Dynamic Modeling and Analysis of Innovative Development Model and Ideological and Political Education Based on Big Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant