CN108960264A - The training method and device of disaggregated model - Google Patents
The training method and device of disaggregated model Download PDFInfo
- Publication number
- CN108960264A CN108960264A CN201710361782.5A CN201710361782A CN108960264A CN 108960264 A CN108960264 A CN 108960264A CN 201710361782 A CN201710361782 A CN 201710361782A CN 108960264 A CN108960264 A CN 108960264A
- Authority
- CN
- China
- Prior art keywords
- data
- target
- subset
- complexity
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
Abstract
This application discloses a kind of training method of disaggregated model and devices, for promoting data analysis efficiency.The training method of the disaggregated model of the application includes: to receive the sample data for being used for train classification models, and sample data includes multiple sample characteristics;Target signature subset is determined from sample data, and the higher-dimension sparse features of target signature subset are determined using higher-dimension rarefaction conversion method;Determine the corresponding target data complexity of the higher-dimension sparse features of target signature subset, which includes multiple dimensions for characterize data feature;According to the mapping relations of established data complexity and sorting algorithm determine target data complexity corresponding to target classification algorithm, and according to the mapping relations of established data complexity and the hyper parameter set of target classification algorithm determine target component corresponding to target data complexity;According to determining target component and the higher-dimension sparse features training objective sorting algorithm of target signature subset, to obtain disaggregated model.
Description
Technical field
This application involves data processing field, in particular to the training method and device of a kind of disaggregated model.
Background technique
With the arrival of big data era, information data increasingly expands, and carries out efficient robust Accurate Analysis to mass data
The market demand constantly expand.Such as the credit classification of the off-network prediction of field of telecommunications, medical diagnosis, access, image mould
Formula identification and network data classification etc..In this context, machine learning is widely applied, especially in machine learning points
Class method is most widely used.
However, the use for classification method is but faced with numerous problems, wherein with feature selecting, eigentransformation, mould
Type selection and arameter optimization are the most difficult, need to make repeated attempts, and modify, then iteration, so that data analytical cycle is long, it is at high cost.
Due to feature selecting, model selection, any one link such as arameter optimization is likely to have an impact final result, so
When data are analyzed, it is desirable that system integrally has higher robustness so that gone out slight problem when a link, be unlikely to
Final result causes very poor influence.
But there are many factor for also just because of this, influencing data analysis, non-to the positioning and debugging cost of data analysis result
Chang Gao, especially big data scene often do the analysis of data and require a great deal of time being calculated, lead to entire data
Analytical cycle is too long, and data analysis efficiency is low.
Summary of the invention
This application provides a kind of training method of disaggregated model and devices, for promoting data analysis efficiency.
The application first aspect provides a kind of training method of disaggregated model, and the disaggregated model is for dividing data
Class.
For the ease of extracting correlated characteristic from sample data, therefore, it is necessary first to receive being used for described in training for input
The sample data of disaggregated model;Wherein, which includes multiple sample characteristics.
Then by determining target signature subset from the sample data, the multiple features for needing to use are filtered out, to subtract
The calculation amount of a small number of evidences.Wherein, which is that correlation and redundancy all meet goal condition in the sample data
Characteristic set.
The higher-dimension sparse features of the target signature subset are determined using higher-dimension rarefaction conversion method, the higher-dimension sparse features
For linear character;LS-SVM sparseness such as is carried out to the target signature subset using Kernel-Based Methods, obtains target signature subset
Higher-dimension sparse features, with improve data analysis precision.
Next determine that the corresponding target data complexity of the higher-dimension sparse features of the target signature subset, the data are complicated
Degree includes multiple dimensions for characterize data feature;The sparse spy of higher-dimension that data complexity carrys out measures characteristic subset can be used
Sign.
Then the target data complexity is determined according to the mapping relations of established data complexity and sorting algorithm
Corresponding target classification algorithm, and the hyper parameter set according to established data complexity and the target classification algorithm
Mapping relations determine target component corresponding to the target data complexity reach optimization algorithm and reduce parameter space
Purpose.Wherein, the super ginseng of the mapping relations and the data complexity of the data complexity and sorting algorithm and sorting algorithm
The mapping relations that manifold is closed can be obtained by pre- learning training.
Finally according to the higher-dimension sparse features of the determining target component and target signature subset training target point
Class algorithm, to obtain the disaggregated model.It can be improved data analysis efficiency using the disaggregated model.
It is above-mentioned that target signature subset is determined from the sample data under a kind of implementation of first aspect, comprising:
The character subset of maximum correlation and minimum redundancy is determined from the sample data;The maximum correlation and minimum
The character subset of redundancy is the target signature subset.Meet feature of maximum correlation and minimum redundancy by extracting
Collection, can filter the unessential data of some degrees of association, to reduce the calculation amount of data.
Under a kind of implementation of first aspect, this determines the target signature subset using higher-dimension rarefaction conversion method
Higher-dimension sparse features, comprising:
Equilibrium treatment is carried out to the target signature subset first, then adds random noise;
Then the target signature subset after above-mentioned carry out equilibrium treatment and addition random noise is split, is torn open
It is divided into the first subset and second subset;
Using the first trained feature sparse coding algorithm, to obtain the extensive model of feature sparse coding;
Second subset is recently entered, the extensive model of feature sparse coding is acted in the second subset for splitting and obtaining
Data, so that it is determined that go out the corresponding higher-dimension sparse features of the second subset.
Under a kind of implementation of first aspect, the above-mentioned mapping according to established data complexity and sorting algorithm
Before relationship determines target classification algorithm corresponding to the target data complexity, this method further include:
The mapping relations of the training data complexity and sorting algorithm, and the training data complexity and sorting algorithm
The mapping relations of hyper parameter set.The implementation obtains the mapping of the data complexity and sorting algorithm by pre- learning training
The mapping relations of the hyper parameter set of relationship and the data complexity and sorting algorithm.
Under a kind of implementation of first aspect, the mapping relations of the training data complexity and sorting algorithm, with
And the mapping relations of hyper parameter set of the training data complexity and sorting algorithm include:
Obtain the multiple sorting algorithms and multiple groups training data of input;
It determines in the multiple groups training data every in the corresponding sorting algorithm of every group of training data and multiple sorting algorithm
The corresponding hyper parameter set of a sorting algorithm;By using the different training data of multiple classification algorithm training multiple groups, obtain every
The statistical information of group training data and the appropriateness of its each sorting algorithm, which includes each sorting algorithm
Classification and the corresponding parameter value range, that is, hyper parameter set of each sorting algorithm.
More parts of data complexities are obtained, which is the number of every group of training data in the multiple groups training data
According to complexity;The characteristics of by using data complexity from every group of training data of multiple dimensional representations, to obtain the multiple groups instruction
Practice the data complexity of every group of training data in data.
Establish the mapping relations of more parts of data complexities and multiple sorting algorithm;According to the essence of the data obtained index
Degree chooses at least one higher sorting algorithm of data target precision, and establishes this part of data complexity and at least one point
The mapping relations of class algorithm.For more parts of data complexities included by multiple groups training data, in the manner described above, more parts are established
The mapping relations of data complexity and the multiple sorting algorithm.
Establish the mapping relations of more parts of data complexities hyper parameter set corresponding with each sorting algorithm.According to institute
The precision for obtaining data target is chosen the higher one group of parameter of data target precision from the hyper parameter set and is joined as target
Number, and establish the mapping relations of target component described in the hyper parameter set of this part of data complexity and the sorting algorithm.For
More parts of data complexities included by multiple groups training data establish more parts of data complexities and each classification in the manner described above
The mapping relations of the hyper parameter set of algorithm.
Under a kind of implementation of first aspect, which includes 12 dimensions for characterize data feature
At least two in degree, which includes: linear discriminant rate, target type range Duplication, single features maximum energy
Effect, linear classification error rate, linear classification minimal error and linear classification face sample proportion, similar sample gather density, difference
Class sample gather density, sample data be non-linear, the super dimension closure of foreign peoples's sample variation, Different categories of samples minimum and each dimension
The sparse rate of value.It is dilute as the higher-dimension for characterizing the target signature subset that at least two dimensions can be chosen from 12 dimensions
Dredge the target data complexity of feature.
The application second aspect provides a kind of training device of disaggregated model, and the disaggregated model is for dividing data
Class, the device include:
Transmit-Receive Unit, for receiving the sample data for training the disaggregated model, which includes multiple samples
Feature;
Processing unit, for determining that target signature subset, the target signature subset are the sample number from the sample data
All meet the characteristic set of goal condition according to middle correlation and redundancy;
The higher-dimension sparse features of the target signature subset are determined using higher-dimension rarefaction conversion method, the higher-dimension sparse features
For linear character;
Determine the corresponding target data complexity of the higher-dimension sparse features of the target signature subset, which includes
Multiple dimensions for characterize data feature;
Corresponding to the target data complexity is determined according to the mapping relations of established data complexity and sorting algorithm
Target classification algorithm, and closed according to the mapping of established data complexity and the hyper parameter set of the target classification algorithm
It is to determine target component corresponding to the target data complexity;
According to the training of the higher-dimension sparse features of the determining target component and the target signature subset, the target classification is calculated
Method, to obtain the disaggregated model.
Under a kind of implementation of second aspect, which is used to determine target signature from the sample data
Collection, comprising:
The processing unit, for determining the character subset of maximum correlation and minimum redundancy from the sample data;It should
Maximum correlation and the character subset of minimum redundancy are the target signature subset.
Under a kind of implementation of second aspect, which is used to determine using higher-dimension rarefaction conversion method should
The higher-dimension sparse features of target signature subset, comprising:
Then the processing unit adds random noise for carrying out equilibrium treatment to the target signature subset;
The target signature subset after progress equilibrium treatment and addition random noise is split as the first subset and second
Subset;
Using the first trained feature sparse coding algorithm, to obtain the extensive model of feature sparse coding;
Second subset is inputted, and determines that the corresponding higher-dimension of the second subset is sparse according to the extensive model of this feature sparse coding
Feature.
Under a kind of implementation of second aspect, which is also used to:
The mapping relations of the training data complexity and sorting algorithm, and the training data complexity and sorting algorithm
The mapping relations of hyper parameter set.
Under a kind of implementation of second aspect, which is used to train the data complexity and sorting algorithm
Mapping relations, and the mapping relations of hyper parameter set of the training data complexity and sorting algorithm include:
The processing unit, for obtaining the multiple sorting algorithms and multiple groups training data of input;
It determines in the multiple groups training data every in the corresponding sorting algorithm of every group of training data and multiple sorting algorithm
The corresponding hyper parameter set of a sorting algorithm;
More parts of data complexities are obtained, which is the number of every group of training data in the multiple groups training data
According to complexity;
Establish the mapping relations of more parts of data complexities and multiple sorting algorithm;
Establish the mapping relations of more parts of data complexities hyper parameter set corresponding with each sorting algorithm.
Under a kind of implementation of second aspect, which includes 12 dimensions for characterize data feature
At least two in degree, which includes: linear discriminant rate, target type range Duplication, single features maximum energy
Effect, linear classification error rate, linear classification minimal error and linear classification face sample proportion, similar sample gather density, difference
Class sample gather density, sample data be non-linear, the super dimension closure of foreign peoples's sample variation, Different categories of samples minimum and each dimension
The sparse rate of value.
The application third aspect provides a kind of calculating equipment, including memory, transceiver, processor and bus system;
Wherein, memory is for storing program and instruction;
Transceiver for receiving or sending information under the control of a processor;
Processor is used to execute the program in memory;
Bus system for connecting memory, transceiver and processor so that memory, transceiver and processor into
Row communication;
Processor is used to call the program instruction in memory, to execute any of the application first aspect or first aspect
The training method of the disaggregated model provided in implementation.
The application fourth aspect provides a kind of computer readable storage medium, stores in the computer readable storage medium
There is instruction, when run on a computer, so that computer executes method described in above-mentioned various aspects.
The 5th aspect of the application provides a kind of computer program product comprising instruction, when it runs on computers
When, so that computer executes method described in above-mentioned various aspects.
As can be seen from the above technical solutions, the application has the following advantages: by determining that target is special from sample data
Subset is levied to reduce data calculation amount;The higher-dimension sparse features of the target signature subset are determined using higher-dimension rarefaction conversion method
To improve the precision of data analysis;It is multiple finally by the corresponding target data of higher-dimension sparse features for determining the target signature subset
Miscellaneous degree;And corresponding to determining the target data complexity according to the mapping relations of established data complexity and sorting algorithm
Target classification algorithm, and the mapping relations according to established data complexity and the hyper parameter set of the target classification algorithm
Determine that target component corresponding to the target data complexity achievees the purpose that optimization algorithm and reduces parameter space;To press
According to the higher-dimension sparse features training of the determining target component and the target signature subset target classification algorithm, to be divided
Class model.It can be improved data analysis efficiency using the disaggregated model.
Detailed description of the invention
Fig. 1 is an organizational structure schematic diagram of the training device of disaggregated model provided herein;
Fig. 2 is an organizational structure schematic diagram of calculating equipment provided herein;
Fig. 3 is a flow diagram of the training method of disaggregated model provided herein;
Fig. 4 is another flow diagram of the training method of disaggregated model provided herein;
Fig. 5 is another flow diagram of the training method of disaggregated model provided herein;
Fig. 6 is another flow diagram of the training method of disaggregated model provided herein;
Fig. 7 is another organizational structure schematic diagram of the training device of disaggregated model provided herein.
Specific embodiment
In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application
The embodiment of the present application is described in attached drawing.
The description and claims of this application and term " first ", " second ", " third ", " in above-mentioned attached drawing
The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage
The data that solution uses in this way are interchangeable under appropriate circumstances, so that the embodiments described herein can be in addition to illustrating herein
Or the sequence other than the content of description is implemented.In addition, term " includes " or " having " and its any deformation, it is intended that covering is not
Exclusive includes, for example, the process, method, system, product or equipment for containing a series of steps or units be not necessarily limited to it is clear
Step or unit those of is listed on ground, but is not clearly listed or for these process, methods, product or is set
Standby intrinsic other step or units.
Classification can regard the mapping process from a data set to one group of predetermined, non-overlapping classification as.Its
The generation of middle mapping relations and the application of mapping relations are exactly the main research contents of Data Mining Classification method.Reflecting here
The relationship of penetrating is exactly the classification function often said or disaggregated model (classifier), and the application of mapping relations then corresponds to us and uses classification
Data item in data set is divided into the process of some classification in given classification by device.
The mathematical definition of classification: a given data set D={ t1, t2 ..., tn } and one group of class C=C1, C2 ...,
Cn }, classification problem is exactly determining mapping f:D- > C, and each tuple ti is assigned in a class.Class Cj containment mapping
To all data tuples in such, i.e. Cj=ti | f (ti)=Cj, 1≤i≤n, and ti ∈ D }.
It has been generally acknowledged that in classification problem, feature selecting, eigentransformation, model selection, four steps such as arameter optimization are
Difficulty is maximum, four the most time-consuming steps.
So designing and developing one kind can reduce data analysis difficulty, the classification learning device of data analysis efficiency is improved
It is necessary to be based on this, what this application provides a kind of based on integrated study is applied to solve the problems, such as the classification mould of data classification
The training device of type.As shown in Figure 1, the training device of the disaggregated model can be a physical host, it is also possible to an object
Manage mainframe cluster, be also possible to for the virtual machine that is configured of disaggregated model in training the application, no matter the training cartridge of disaggregated model
How is the form set, as long as a computing engines can be equivalent to, is able to achieve the training process of the disaggregated model in the application.
The device includes three cores.Wherein, first part is characterized selected section, and feature selecting part is for realizing to input
The selection of the feature of data, such as extraction correlation and redundancy all meet the feature of goal condition from the sample data of input
Set.Second part is characterized sparse coding part, and feature sparse coding part is mainly by using non-linear (such as core letter
Number) method, rarefaction is carried out to the character subset of data, to obtain the higher-dimension sparse features of character subset.Part III is mould
Type selection and super ginseng space compression part are selected since version is various and the super ginseng space of each model is huge using model
Select and super ginseng space compression part in established data complexity and sorting algorithm mapping relations and data complexity with
The mapping relations in the super ginseng space of sorting algorithm select suitable sorting algorithm for the sample data of input, and reduce classification and calculate
Super ginseng space required for method training.It is trained for using the higher-dimension sparse features in super ginseng space and character subset after diminution defeated
The selected suitable sorting algorithm of the sample data entered, to obtain disaggregated model, to improve the efficiency of data analysis.
The training device of disaggregated model in Fig. 1 can be realized by the calculating equipment 200 in Fig. 2, calculate equipment 200
Institutional framework schematic diagram can also include bus 208 as shown in Fig. 2, including processor 202, memory 204 and transceiver 206.
Wherein, the communication between processor 202, memory 204 and transceiver 206 can be realized by bus 208
Connection can also realize communication by other means such as wireless transmissions.
Memory 204 may include volatile memory (volatile memory), such as random access memory
(random-access memory, RAM);Memory also may include nonvolatile memory (non-volatile
Memory), such as read-only memory (read-only memory, ROM), flash memory (flash memory), hard disk
(hard disk drive, HDD) or solid state hard disk (solid state drive, SSD);Memory 204 can also include upper
State the combination of the memory of type.When by software to realize technical solution provided by the present application, for realizing the application Fig. 3
The program code of the training method of the disaggregated model of offer saves in memory 204, and is executed by processor 202.
Equipment 200 is calculated to communicate by transceiver 206 with other equipment.
Processor 202 can be central processing unit (central processing unit, CPU).
In the application, the transceiver 206, for receiving the sample data for training the disaggregated model, the sample
Notebook data includes multiple sample characteristics;
The processor 202, for target signature subset determining from the sample data, the target signature subset is
Correlation and redundancy all meet the characteristic set of goal condition in the sample data;
The higher-dimension sparse features of the target signature subset are determined using higher-dimension rarefaction conversion method, the higher-dimension is sparse
Feature is linear character;
Determine the corresponding target data complexity of the higher-dimension sparse features of the target signature subset, the data complexity
Including multiple dimensions for characterize data feature;
Determine that the target data complexity institute is right according to the mapping relations of established data complexity and sorting algorithm
The target classification algorithm answered, and reflecting according to established data complexity and the hyper parameter set of the target classification algorithm
The relationship of penetrating determines target component corresponding to the target data complexity;
According to the higher-dimension sparse features of the determining target component and the target signature subset training target
Sorting algorithm, to obtain the disaggregated model.
Processor 202 in the application from sample data by determining target signature subset to reduce data calculation amount;
The higher-dimension sparse features of the target signature subset are determined using higher-dimension rarefaction conversion method to improve the precision of data analysis;
Finally by the corresponding target data complexity of higher-dimension sparse features of the determination target signature subset;And according to established
The mapping relations of data complexity and sorting algorithm determine target classification algorithm corresponding to the target data complexity, and
The number of targets is determined according to the mapping relations of established data complexity and the hyper parameter set of the target classification algorithm
Achieve the purpose that optimization algorithm according to target component corresponding to complexity and reduces parameter space;To according to described in determining
The higher-dimension sparse features of target component and the target signature subset training target classification algorithm, to obtain classification mould
Type.It can be improved data analysis efficiency using the disaggregated model.
Optionally, the processor 202 is used to determine target signature subset from the sample data, comprising:
The processor 202, for determining feature of maximum correlation and minimum redundancy from the sample data
Collection;The maximum correlation and the character subset of minimum redundancy are the target signature subset.
Optionally, the processor 202 is used to determine the target signature subset using higher-dimension rarefaction conversion method
Higher-dimension sparse features, comprising:
Then the processor 202 adds random noise for carrying out equilibrium treatment to the target signature subset;
It will carry out equilibrium treatment and add the target signature subset after random noise to be split as the first subset and the
Two subsets;
Using the first trained feature sparse coding algorithm, to obtain the extensive model of feature sparse coding;
Second subset is inputted, and the corresponding higher-dimension of the second subset is determined according to the extensive model of feature sparse coding
Sparse features.
Optionally, the processor 202, is also used to:
The mapping relations of the training data complexity and sorting algorithm, and the training data complexity and classification are calculated
The mapping relations of the hyper parameter set of method.
Optionally, the processor 202 is used to train the mapping relations of the data complexity and sorting algorithm, Yi Jixun
The mapping relations for practicing the hyper parameter set of the data complexity and sorting algorithm include:
The processor 202, for obtaining the multiple sorting algorithms and multiple groups training data of input;
Determine the corresponding sorting algorithm of every group of training data and the multiple sorting algorithm in the multiple groups training data
In the corresponding hyper parameter set of each sorting algorithm;
More parts of data complexities are obtained, the more parts of data complexities are every group of training datas in the multiple groups training data
Data complexity;
Establish the mapping relations of more parts of data complexities and the multiple sorting algorithm;
Establish the mapping relations of more parts of data complexities hyper parameter set corresponding with each sorting algorithm.
Optionally, the data complexity includes at least two in 12 dimensions for characterize data feature, institute
Stating 12 dimensions includes: linear discriminant rate, target type range Duplication, single features maximum efficiency, linear classification error
Rate, linear classification minimal error and linear classification face sample proportion, similar sample gather density, inhomogeneity sample gather density,
Sample data is non-linear, the super dimension closure of foreign peoples's sample variation, Different categories of samples minimum and each sparse rate of dimension value.
Present invention also provides a kind of training method of disaggregated model, the calculating equipment 200 in Fig. 2 executes the party when running
Method, flow diagram are as shown in Figure 3.
301, the sample data for training the disaggregated model is received.
The sample data includes multiple sample characteristics, and the step is by reception input for training the disaggregated model
Sample data, to extract correlated characteristic from sample data.
302, target signature subset is determined from the sample data.
It should be noted that including multiple sample characteristics in the sample data of input, for example, the feature for measuring someone includes
Multiple features such as height, weight, age.However and not all feature require to use, therefore, pass through the step extract mesh
Character subset is marked, the multiple features for needing to use are filtered out, to reduce calculation amount.
Wherein, the target signature subset is the spy that correlation and redundancy all meet goal condition in the sample data
Collection is closed.Specifically, the target signature subset is to meet the feature of maximum correlation and minimum redundancy in the sample data
Subset.
It should be noted that the bigger expression correlation of mutual information is higher.But top k selected according to mutual information size
The modeling effect of feature be but not necessarily it is optimal, this is because individually considering that the mutual information of each feature and target has ignored spy
Association between sign, it is easy to introduce redundancy feature.Thus need while considering the correlation and redundancy of character subset.It is based on
Mutual information principle, it can be deduced that the Unified Solution objective function of maximum correlation and minimum redundancy.According to maximum correlation and
The Unified Solution function of minimum redundancy determines the feature for meeting maximum correlation and minimum redundancy from the sample data
Subset.
Wherein, the calculation formula of mutual information is as follows:Wherein, I
(x;Y) indicate that the mutual information between feature x and feature y, p (x, y) indicate the joint probability of feature x and feature y.Feature correlation
Calculation formula is as follows:Wherein, S indicates the combination of all character subsets, c table
Showing the target of classification, D indicates the correlation of feature and the target of classification, | S | indicate the number of all character subsets, I indicates mutual
Information, xiIndicate ith feature.Feature redundancy calculation formula is as follows:
Wherein, S indicates the combination of all character subsets, and R indicates the redundancy of feature and feature, | S | indicate of all character subsets
Number, I indicate mutual information, xiIndicate ith feature, xjIndicate j-th of feature.It is superfluous based on mutual information, feature correlation and feature
The calculation method of remaining can show that the Unified Solution objective function of maximum correlation and minimum redundancy is as follows:Wherein, m-1 indicate obtained meet maximum correlation and
The Characteristic Number of minimum redundancy, X indicate the set of all features, Sm-1Indicate that the m-1 having been selected meets maximum correlation
With the characteristic set of minimum redundancy, xiIndicate ith feature, xjIndicate j-th of feature.
Maximum phase can be determined from sample data by the maximum correlation minimum redundancy Unified Solution objective function
The character subset of closing property and minimum redundancy.Feature selecting part as shown in connection with fig. 1, maximum correlation which obtains and
The character subset of minimum redundancy can be used as the character subset of subsequent characteristics sparse coding processing.
303, the higher-dimension sparse features of the target signature subset are determined using higher-dimension rarefaction conversion method.
The higher-dimension sparse features of the target signature subset are determined using higher-dimension rarefaction conversion method.Higher-dimension rarefaction turns
Change including but not limited to non-linear (such as kernel function) method of method.Such as using Kernel-Based Methods to the target signature subset into
Row LS-SVM sparseness obtains the higher-dimension sparse features of target signature subset.
Optionally, the higher-dimension sparse features that the target signature subset is determined using higher-dimension rarefaction conversion method,
Include:
Equilibrium treatment is carried out to the target signature subset, after then adding random noise;
It will carry out equilibrium treatment and add the target signature subset after random noise to be split as the first subset and the
Two subsets;
Using the first trained feature sparse coding algorithm, to obtain the extensive model of feature sparse coding;The spy
Levying the extensive model of sparse coding is the nonlinear characteristic transformation model obtained using nonlinear method;
Second subset is inputted, and the corresponding higher-dimension of the second subset is determined according to the extensive model of feature sparse coding
Sparse features.
It should be noted that as shown in connection with fig. 4, the specific implementation of the process can be, first to the sample number
Equilibrium treatment is carried out according to the target signature subset of middle extraction, then adds random noise;Wherein, available equalization processing method packet
Include stochastical sampling (such as gibbs Gibbs sampling algorithm) and stratified sampling, available random noise include white Gaussian noise and
Uniform noise etc..
Then the target signature subset after above-mentioned carry out equilibrium treatment and addition random noise is split,
Middle obtained the first subset i.e. sample 1 that splits is for training characteristics sparse coding algorithm, to obtain the extensive mould of feature sparse coding
Type;
The extensive model of feature sparse coding is finally acted on to the data split in obtained second subset, obtains the
The higher-dimension sparse features of two subsets, that is, sample 2.
Wherein, LS-SVM sparseness is carried out to target signature subset using nonlinear method, to obtain target signature subset
The mode of higher-dimension sparse features can refer to as follows:
One kind is possible to be achieved in that using three-layer neural network (number of plies can be with flexible configuration) to target signature subset
Nonlinear transformation is carried out, key step is as follows:
The connection system between each layer is obtained using Back Propagation Algorithm training to the first obtained subset that splits first
Number;There are a coefficient ratio, which is coefficient of connection for output between each layer of neural network.
Then each sample vi of the second subset for splitting and obtaining is traversed, wherein traversal method can be used preceding to biography
Broadcasting method, to obtain each sample vi in the output of the last layer hidden layer (i.e. the second layer) of the three-layer neural network;
The higher-dimension sparse features expression of target signature subset in order to obtain can be the feature of each dimension in multiple dimensions
Activation primitive is added, the multiple dimension is used to characterize the data characteristics of the target signature subset.Wherein, the activation primitive to
Determine whether the feature of each dimension retains, activation primitive can choose sigmoid function, softmax function and relu letter
Number etc. in any one.
Alternatively possible is achieved in that by taking random forest as an example, and in machine learning, random forest is one and includes
The classifier of multiple decision trees, the application utilize Random Forest model (similar model such as iteration decision tree (Gradient
Boosting Decision Tree, GBDT)) to carry out higher-dimension to the target signature subset sparse and make the data after rarefaction
Have systematicness distribution, key step is as follows:
It obtains a Random Forest model first against obtained first trained of splitting (model includes more
Tree);
Then each sample vi of the second subset for splitting and obtaining is traversed, each sample vi can fall on each tree tj
Some leaf node;
The feature vector for constructing a sparse matrix for each tree tj for each sample vi, by leaf where each sample vi
The feature vector of node is set to (1,0), remaining is set to (0,0);
Features described above vector is arranged in the feature vector for the sparse matrix that a length is tj in order.As vector 1 (0,
1), vector 2 (1,1), vector 3 (1,0), the feature vector of the sparse matrix obtained after arrangement are (0,1,1,1,1,0).
304, the corresponding target data complexity of the higher-dimension sparse features of the target signature subset is determined.
It should be noted that the difference between different data collection is multifarious, in order to find data set and algorithm and ginseng
Several incidence relation, it is necessary to the abstract characteristics of data are portrayed using identical method.Therefore, data complexity degree of coming can be used
The higher-dimension sparse features of measure feature subset.The data complexity includes multiple dimensions for characterize data feature, specifically,
The data complexity may include at least two in 12 dimensions for characterize data feature.12 dimension packets
Include: linear discriminant rate, target type range Duplication, single features maximum efficiency, linear classification error rate, linear classification are minimum
Error and linear classification face sample proportion, similar sample gather density, inhomogeneity sample gather density, sample data be non-linear,
The super dimension closure of foreign peoples's sample variation, Different categories of samples minimum and each sparse rate of dimension value.It can be from 12 dimensions
Choose target data complexity of at least two dimensions as the higher-dimension sparse features for characterizing the target signature subset.
It should be noted that be based on Fishe ' s linear decision rule, the linear discriminant rate for calculate some dimension to point
Analyse the discriminating power of target.Calculation formula is as follows:Wherein, μ1、μ2Respectively correspond the equal of two classifications
Value,Respectively correspond the variance of two classifications.
Target type range Duplication, describes the Duplication of the affiliated bounds of different classifications target category, passes through
Company multiplies the classification Duplication that all features are degree.Calculation formula are as follows:Wherein,
MAX and MIN respectively indicates certain dimensional feature to the maximum value and minimum value of some class categories.
Single features maximum efficiency is to be fallen in except feature overlapping region by considering, and value is perpendicular to feature dimensions
Spend the ratio of the sample characteristics of hyperplane.
Linear classification error rate is the linear separability degree for measuring data sample.
Linear classification minimal error and, be the minimum mistake that all sample points are measured using linear classification face to Optimal Separating Hyperplane
Difference and.Formula are as follows: minimize: atT, constraint condition: Ztw+t≥b,t≥0.Wherein, a, b, w are the classification of any linear classifier
Hyperplane parameter.
Linear classification face sample proportion refers to the sample proportion just at classification boundaries.
Similar sample gather density is measured in sample, the aggregation tightness degree of similar sample.Formula are as follows:Wherein, intraDist (xi) indicate sample xiWith the most narrow spacing of other similar samples
From interDist (xi) indicate sample xiWith the minimum range of foreign peoples's sample.
Inhomogeneity sample gather density describes the aggregation tightness degree between different classes of sample.
Sample data is non-linear to describe the non-linear of data sample.Basic skills is repeatedly to use linear partition at random
Generic sample is divided in face, then calculates mistake and divides rate.
Foreign peoples's sample variation describe it is non-linear between sample and its foreign peoples, basic skills be calculate sample and its
The distance relation of closest foreign peoples's convex closure.
The minimum for the super dimension closure that the super dimension closure of Different categories of samples minimum is used to describe each classification is spherical (comprising similar sample
And do not have overlapping minimum spherical with other classifications) number.It is each classification by normalizing the closure ball number of whole classifications
Calculating ratio.
Each sparse rate of dimension value indicates that being averaged in each dimension has value sample proportion.Calculation formula are as follows:Wherein, m is the number of samples that the dimension has value, and n is total sample number.
305, the target data complexity is determined according to the mapping relations of established data complexity and sorting algorithm
Corresponding target classification algorithm, and the hyper parameter set according to established data complexity and the target classification algorithm
Mapping relations determine target component corresponding to the target data complexity.
It should be noted that the mapping relations and the data complexity of the data complexity and sorting algorithm with point
Creating a mechanism for the mapping relations of the hyper parameter set of class algorithm can refer to following implementation.
Optionally, the mapping relations according to established data complexity and sorting algorithm determine the target data
Before target classification algorithm corresponding to complexity, the method also includes:
The mapping relations of the training data complexity and sorting algorithm, and the training data complexity and classification are calculated
The mapping relations of the hyper parameter set of method.
The mapping relations of the training data complexity and sorting algorithm, and train the data complexity and divide
The mapping relations of the hyper parameter set of class algorithm include:
Obtain the multiple sorting algorithms and multiple groups training data of input;
Determine that the corresponding sorting algorithm of every group of training data and the multiple classification in the multiple groups training data are calculated
The corresponding hyper parameter set of each sorting algorithm in method;
More parts of data complexities are obtained, the more parts of data complexities are every group of training datas in the multiple groups training data
Data complexity;
Establish the mapping relations of more parts of data complexities and the multiple sorting algorithm;
Establish the mapping relations of more parts of data complexities hyper parameter set corresponding with each sorting algorithm.
It should be noted that as shown in connection with fig. 5, which can be by learning in advance, and training obtains data complexity and classification
The mapping relations of the hyper parameter set of the mapping relations and data complexity and sorting algorithm of algorithm.
The foundation of the mapping relations of the data complexity and sorting algorithm can be instructed first by using multiple sorting algorithms
Practice the different training data of multiple groups, obtains the statistical information of every group of training data and the appropriateness of its each sorting algorithm, it should
Statistical information includes the classification and the corresponding parameter value range of each sorting algorithm i.e. hyper parameter collection of each sorting algorithm
It closes.
The characteristics of using data complexity from every group of training data of multiple dimensional representations, to obtain the multiple groups training data
In every group of training data data complexity.
The data characteristics of every group of training data are characterized with a data complexity, by every group of training data of input,
Training obtains the data target between that corresponding a data complexity of every group of training data and multiple sorting algorithms.According to institute
The precision of data target is obtained, chooses at least one higher sorting algorithm of data target precision, and establish this part of data complexity
With the mapping relations of at least one sorting algorithm.For more parts of data complexities included by multiple groups training data, according to upper
Mode is stated, the mapping relations of more parts of data complexities and the multiple sorting algorithm are established.
Since the corresponding hyper parameter set of each sorting algorithm is huge, it is therefore necessary to be carried out to the range of hyper parameter set
Reduction.In the way of above-mentioned Accuracy Measure, that corresponding a data complexity of every group of training data of training and selected
It is suitble to the data target of the hyper parameter set of the sorting algorithm of (precision is higher).According to the precision of the data obtained index, from described
In hyper parameter set choose the higher one group of parameter of data target precision as target component, and establish this part of data complexity and
The mapping relations of target component described in the hyper parameter set of the sorting algorithm.For more numbers included by multiple groups training data
According to complexity, in the manner described above, the mapping relations of the hyper parameter set of more parts of data complexities and each sorting algorithm are established.
For example, initializing the set of multiple sorting algorithms, random forest (random forest, RF) such as is selected,
Logistic returns (logistic regression, LR), support vector machines (support vector machine, SVM) etc.
Three sorting algorithms, and number is 0,1,2 respectively.
Multiple groups difference training data is chosen, and is trained in above three sorting algorithm respectively, with training gained knot
The data precision (i.e. accuracy) of fruit is Measure Indexes, obtains according to the division of precision and is suitble to divide used in each group training data
Class algorithm.
According to data complexity measure, 12 of each group training data included by data complexity are respectively obtained
Index under dimension.It is to divide under 12 dimensional representations of a certain group of training data included by data complexity such as the following table 1
The data acquisition system not being trained in above three sorting algorithm.
Table 1
To the data acquisition system under 12 dimensions of each group training data in table 1 included by data complexity according to precision
It is divided, selects the corresponding data complexity of this group of training data to be suitble to the sorting algorithm of (corresponding precision is higher), and build
Found the mapping relations of this part of data complexity Yu the sorting algorithm.In this manner, more parts of data complexities and multiple points are established
The mapping relations of class algorithm.
It is closed with reference to the mapping of same method, the hyper parameter set and each part data complexity of establishing each sorting algorithm
System.
According to the mapping relations of more parts of data complexities and multiple sorting algorithms of above-mentioned training, for a sample of input
Notebook data obtains the target data complexity for characterizing this part of sample data feature, that is, can determine that the target data is complicated
The corresponding target classification algorithm of degree;Likewise, according to the super of more parts of data complexities of above-mentioned training and each sorting algorithm
The mapping relations of parameter sets can determine target component corresponding to the target data complexity.
306, according to the training of the higher-dimension sparse features of the determining target component and the target signature subset
Target classification algorithm, to obtain disaggregated model.
As shown in connection with fig. 6, it is closed in applying step 305 by the mapping of the data complexity of pre- learning training and sorting algorithm
The mapping relations of the hyper parameter set of system and data complexity and sorting algorithm, for the data of given a sample data
Complexity can determine the selected target classification algorithm of the sample data and target component by step 305, according to determination
The target component and the target signature subset higher-dimension sparse features, by integrated learning approach (such as using
Bagging (bootstrap aggregating) algorithm or Boosting (adaptive boosting) algorithm) training institute
State target classification algorithm, and final output disaggregated model.It can be improved data analysis efficiency using the disaggregated model.
The application from sample data by determining target signature subset to reduce data calculation amount;Utilize higher-dimension rarefaction
Conversion method determines the higher-dimension sparse features of the target signature subset to improve the precision of data analysis;Finally by determining institute
State the corresponding target data complexity of higher-dimension sparse features of target signature subset;And according to established data complexity and divide
The mapping relations of class algorithm determine target classification algorithm corresponding to the target data complexity, and according to established number
It is determined corresponding to the target data complexity according to the mapping relations of complexity and the hyper parameter set of the target classification algorithm
Target component achieve the purpose that optimization algorithm and reduce parameter space;To according to the determining target component and institute
The higher-dimension sparse features training target classification algorithm of target signature subset is stated, to obtain disaggregated model.Using the classification mould
Type can be improved data analysis efficiency.
It should be noted that in the application, if not limiting the sequencing between each step without specified otherwise, not limiting
Relation of interdependence between each step.
Present invention also provides the training device 700 of disaggregated model, which can be set by calculating shown in Fig. 2
Standby 200 realize, can also by specific integrated circuit (application-specific integrated circuit,
ASIC it) realizes or programmable logic device (programmable logic device, PLD) is realized.Above-mentioned PLD can be multiple
Miscellaneous programmable logic device (complex programmable logic device, CPLD), Universal Array Logic (generic
Array logic, GAL) or any combination thereof.The training device 700 of the disaggregated model is for realizing disaggregated model shown in Fig. 3
Training method.When by the training method of software realization disaggregated model shown in Fig. 3, the device 700 or software mould
Block.
The institutional framework schematic diagram of the training device 700 of disaggregated model as shown in fig. 7, comprises: Transmit-Receive Unit 702 and processing
Unit 704.When Transmit-Receive Unit 702 works, the step 301 and step in the training method of disaggregated model shown in Fig. 3 are executed
Optinal plan in 301.When processing unit 704 works, the step 302 in the training method of disaggregated model shown in Fig. 3 is executed
~306 and step 302~306 in optinal plan.It should be noted that processing unit 704 can also be by institute in such as Fig. 2 in the application
The processor 202 shown realizes that Transmit-Receive Unit 702 can also be realized by transceiver 206 as shown in Figure 2.
The associated description of above-mentioned apparatus can correspond to associated description and effect refering to embodiment of the method part and be understood,
This place, which is not done, excessively to be repeated.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real
It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.
The computer program product includes one or more computer instructions.Load and execute on computers the meter
When calculation machine program instruction, entirely or partly generate according to process or function described in the embodiment of the present invention.The computer can
To be general purpose computer, special purpose computer, computer network or other programmable devices.The computer instruction can be deposited
Storage in a computer-readable storage medium, or from a computer readable storage medium to another computer readable storage medium
Transmission, for example, the computer instruction can pass through wired (example from a web-site, computer, server or data center
Such as coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (such as infrared, wireless, microwave) mode to another website
Website, computer, server or data center are transmitted.The computer readable storage medium can be computer and can deposit
Any usable medium of storage either includes that the data storages such as one or more usable mediums integrated server, data center are set
It is standby.The usable medium can be magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or partly lead
Body medium (such as solid state hard disk (solid state disk, SSD)) etc..
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed system, device and method can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit
It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components
It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or
The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit
It closes or communicates to connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
In addition, each functional unit in the embodiment of the present application can integrate in one processing unit, it is also possible to each
A unit physically exists alone, and can also be integrated in one unit with two or more units.Above-mentioned integrated unit was both
It can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product
When, it can store in a computer readable storage medium.Based on this understanding, the technical solution of the application is substantially
The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words
It embodies, which is stored in a storage medium, including some instructions are used so that a computer
Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the application
Portion or part steps.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (read-only memory,
ROM), random access memory (random access memory, RAM), magnetic or disk etc. are various can store program
The medium of code.
The above, above embodiments are only to illustrate the technical solution of the application, rather than its limitations;Although referring to before
Embodiment is stated the application is described in detail, those skilled in the art should understand that: it still can be to preceding
Technical solution documented by each embodiment is stated to modify or equivalent replacement of some of the technical features;And these
It modifies or replaces, the spirit and scope of each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution.
Claims (15)
1. a kind of training method of disaggregated model, which is characterized in that the disaggregated model is for classifying to data, the side
Method includes:
The sample data for training the disaggregated model is received, the sample data includes multiple sample characteristics;
From the sample data determine target signature subset, the target signature subset be the sample data in correlation and
Redundancy all meets the characteristic set of goal condition;
The higher-dimension sparse features of the target signature subset, the higher-dimension sparse features are determined using higher-dimension rarefaction conversion method
For linear character;
Determine the corresponding target data complexity of the higher-dimension sparse features of the target signature subset, the data complexity includes
Multiple dimensions for characterize data feature;
Corresponding to the target data complexity is determined according to the mapping relations of established data complexity and sorting algorithm
Target classification algorithm, and closed according to the mapping of established data complexity and the hyper parameter set of the target classification algorithm
System determines target component corresponding to the target data complexity;
According to the higher-dimension sparse features of the determining target component and the target signature subset training target classification
Algorithm, to obtain the disaggregated model.
2. the method according to claim 1, wherein described determine target signature from the sample data
Collection, comprising:
The character subset of maximum correlation and minimum redundancy is determined from the sample data;The maximum correlation and minimum
The character subset of redundancy is the target signature subset.
3. the method according to claim 1, wherein described determine the mesh using higher-dimension rarefaction conversion method
Mark the higher-dimension sparse features of character subset, comprising:
Equilibrium treatment is carried out to the target signature subset, then adds random noise;
The target signature subset after progress equilibrium treatment and addition random noise is split as the first subset and the second son
Collection;
Using the first trained feature sparse coding algorithm, to obtain the extensive model of feature sparse coding;
Second subset is inputted, and determines that the corresponding higher-dimension of the second subset is sparse according to the extensive model of feature sparse coding
Feature.
4. method according to any one of claims 1 to 3, which is characterized in that described according to established data complexity
Before determining target classification algorithm corresponding to the target data complexity with the mapping relations of sorting algorithm, the method is also
Include:
The mapping relations of the training data complexity and sorting algorithm, and the training data complexity and sorting algorithm
The mapping relations of hyper parameter set.
5. according to the method described in claim 4, it is characterized in that, the training data complexity and sorting algorithm are reflected
Penetrate relationship, and the mapping relations of hyper parameter set of the training data complexity and sorting algorithm include:
Obtain the multiple sorting algorithms and multiple groups training data of input;
It determines in the multiple groups training data every in the corresponding sorting algorithm of every group of training data and the multiple sorting algorithm
The corresponding hyper parameter set of a sorting algorithm;
More parts of data complexities are obtained, the more parts of data complexities are the numbers of every group of training data in the multiple groups training data
According to complexity;
Establish the mapping relations of more parts of data complexities and the multiple sorting algorithm;
Establish the mapping relations of more parts of data complexities hyper parameter set corresponding with each sorting algorithm.
6. method according to any one of claims 1 to 3, which is characterized in that the data complexity includes for characterizing
At least two in 12 dimensions of data characteristics, 12 dimensions include: linear discriminant rate, target type range weight
It is folded rate, single features maximum efficiency, linear classification error rate, linear classification minimal error and linear classification face sample proportion, same
Class sample gather density, inhomogeneity sample gather density, sample data be non-linear, foreign peoples's sample variation, Different categories of samples are minimum
Super dimension closure and each sparse rate of dimension value.
7. a kind of training device of disaggregated model, which is characterized in that the disaggregated model is for classifying to data, the dress
It sets and includes:
Transmit-Receive Unit, for receiving the sample data for training the disaggregated model, the sample data includes multiple samples
Feature;
Processing unit, for determining that target signature subset, the target signature subset are the sample from the sample data
Correlation and redundancy all meet the characteristic set of goal condition in data;
The higher-dimension sparse features of the target signature subset, the higher-dimension sparse features are determined using higher-dimension rarefaction conversion method
For linear character;
Determine the corresponding target data complexity of the higher-dimension sparse features of the target signature subset, the data complexity includes
Multiple dimensions for characterize data feature;
Corresponding to the target data complexity is determined according to the mapping relations of established data complexity and sorting algorithm
Target classification algorithm, and closed according to the mapping of established data complexity and the hyper parameter set of the target classification algorithm
System determines target component corresponding to the target data complexity;
According to the higher-dimension sparse features of the determining target component and the target signature subset training target classification
Algorithm, to obtain the disaggregated model.
8. device according to claim 7, which is characterized in that the processing unit is used to determine from the sample data
Target signature subset, comprising:
The processing unit, for determining the character subset of maximum correlation and minimum redundancy from the sample data;Institute
The character subset for stating maximum correlation and minimum redundancy is the target signature subset.
9. device according to claim 7, which is characterized in that the processing unit is used to utilize higher-dimension rarefaction conversion side
Method determines the higher-dimension sparse features of the target signature subset, comprising:
Then the processing unit adds random noise for carrying out equilibrium treatment to the target signature subset;
The target signature subset after progress equilibrium treatment and addition random noise is split as the first subset and the second son
Collection;
Using the first trained feature sparse coding algorithm, to obtain the extensive model of feature sparse coding;
Second subset is inputted, and determines that the corresponding higher-dimension of the second subset is sparse according to the extensive model of feature sparse coding
Feature.
10. device according to any one of claims 7 to 9, which is characterized in that the processing unit is also used to:
The mapping relations of the training data complexity and sorting algorithm, and the training data complexity and sorting algorithm
The mapping relations of hyper parameter set.
11. device according to claim 10, which is characterized in that the processing unit is for training the data complexity
With the mapping relations of sorting algorithm, and the training data complexity and sorting algorithm hyper parameter set mapping relations packet
It includes:
The processing unit, for obtaining the multiple sorting algorithms and multiple groups training data of input;
It determines in the multiple groups training data every in the corresponding sorting algorithm of every group of training data and the multiple sorting algorithm
The corresponding hyper parameter set of a sorting algorithm;
More parts of data complexities are obtained, the more parts of data complexities are the numbers of every group of training data in the multiple groups training data
According to complexity;
Establish the mapping relations of more parts of data complexities and the multiple sorting algorithm;
Establish the mapping relations of more parts of data complexities hyper parameter set corresponding with each sorting algorithm.
12. according to the described in any item devices of claim 7 to 10, which is characterized in that the data complexity includes being used for table
At least two in 12 dimensions of data characteristics are levied, 12 dimensions include: linear discriminant rate, target type range
Duplication, single features maximum efficiency, linear classification error rate, linear classification minimal error and linear classification face sample proportion,
Similar sample gather density, inhomogeneity sample gather density, sample data be non-linear, foreign peoples's sample variation, Different categories of samples most
Small super dimension closure and each sparse rate of dimension value.
13. a kind of calculating equipment characterized by comprising memory, transceiver, processor and bus system;
Wherein, the memory is for storing program and instruction;
The transceiver is for receiving or sending information under the control of the processor;
The processor is used to execute the program in the memory;
The bus system is for connecting the memory, the transceiver and the processor, so that the memory, institute
It states transceiver and the processor is communicated;
The processor is used to call the program instruction in the memory, executes as described in any one of claims 1 to 6
Method.
14. a kind of computer readable storage medium, including instruction, which is characterized in that when run on a computer, make to succeed in one's scheme
Calculation machine executes the method as described in claim 1 to 6 any one.
15. a kind of computer program product comprising instruction, which is characterized in that when run on a computer, so that calculating
Machine executes the method as described in claim 1 to 6 any one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710361782.5A CN108960264A (en) | 2017-05-19 | 2017-05-19 | The training method and device of disaggregated model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710361782.5A CN108960264A (en) | 2017-05-19 | 2017-05-19 | The training method and device of disaggregated model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108960264A true CN108960264A (en) | 2018-12-07 |
Family
ID=64462257
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710361782.5A Pending CN108960264A (en) | 2017-05-19 | 2017-05-19 | The training method and device of disaggregated model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108960264A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109871809A (en) * | 2019-02-22 | 2019-06-11 | 福州大学 | A kind of machine learning process intelligence assemble method based on semantic net |
CN110163259A (en) * | 2019-04-26 | 2019-08-23 | 阿里巴巴集团控股有限公司 | A kind of method, system and equipment generating sample data |
CN110210018A (en) * | 2019-05-14 | 2019-09-06 | 北京百度网讯科技有限公司 | It registers the matching process and device of department |
CN110309127A (en) * | 2019-07-02 | 2019-10-08 | 联想(北京)有限公司 | A kind of data processing method, device and electronic equipment |
CN110825966A (en) * | 2019-10-31 | 2020-02-21 | 广州市百果园信息技术有限公司 | Information recommendation method and device, recommendation server and storage medium |
CN111143203A (en) * | 2019-12-13 | 2020-05-12 | 支付宝(杭州)信息技术有限公司 | Machine learning method, privacy code determination method, device and electronic equipment |
WO2020118743A1 (en) * | 2018-12-14 | 2020-06-18 | 深圳先进技术研究院 | Data feature extraction method, apparatus and electronic device |
CN111382210A (en) * | 2018-12-27 | 2020-07-07 | 中国移动通信集团山西有限公司 | Classification method, device and equipment |
CN112966182A (en) * | 2021-03-09 | 2021-06-15 | 中国民航信息网络股份有限公司 | Project recommendation method and related equipment |
CN113159085A (en) * | 2020-12-30 | 2021-07-23 | 北京爱笔科技有限公司 | Training of classification model, image-based classification method and related device |
CN116028829A (en) * | 2021-01-20 | 2023-04-28 | 国义招标股份有限公司 | Correction clustering processing method, device and storage medium based on transmission step length adjustment |
-
2017
- 2017-05-19 CN CN201710361782.5A patent/CN108960264A/en active Pending
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020118743A1 (en) * | 2018-12-14 | 2020-06-18 | 深圳先进技术研究院 | Data feature extraction method, apparatus and electronic device |
CN111325227B (en) * | 2018-12-14 | 2023-04-07 | 深圳先进技术研究院 | Data feature extraction method and device and electronic equipment |
CN111325227A (en) * | 2018-12-14 | 2020-06-23 | 深圳先进技术研究院 | Data feature extraction method and device and electronic equipment |
CN111382210B (en) * | 2018-12-27 | 2023-11-10 | 中国移动通信集团山西有限公司 | Classification method, device and equipment |
CN111382210A (en) * | 2018-12-27 | 2020-07-07 | 中国移动通信集团山西有限公司 | Classification method, device and equipment |
CN109871809A (en) * | 2019-02-22 | 2019-06-11 | 福州大学 | A kind of machine learning process intelligence assemble method based on semantic net |
CN110163259B (en) * | 2019-04-26 | 2023-12-15 | 创新先进技术有限公司 | Method, system and equipment for generating sample data |
CN110163259A (en) * | 2019-04-26 | 2019-08-23 | 阿里巴巴集团控股有限公司 | A kind of method, system and equipment generating sample data |
CN110210018A (en) * | 2019-05-14 | 2019-09-06 | 北京百度网讯科技有限公司 | It registers the matching process and device of department |
CN110210018B (en) * | 2019-05-14 | 2023-07-11 | 北京百度网讯科技有限公司 | Matching method and device for registration department |
CN110309127A (en) * | 2019-07-02 | 2019-10-08 | 联想(北京)有限公司 | A kind of data processing method, device and electronic equipment |
CN110825966A (en) * | 2019-10-31 | 2020-02-21 | 广州市百果园信息技术有限公司 | Information recommendation method and device, recommendation server and storage medium |
CN110825966B (en) * | 2019-10-31 | 2022-03-04 | 广州市百果园信息技术有限公司 | Information recommendation method and device, recommendation server and storage medium |
CN111143203A (en) * | 2019-12-13 | 2020-05-12 | 支付宝(杭州)信息技术有限公司 | Machine learning method, privacy code determination method, device and electronic equipment |
CN111143203B (en) * | 2019-12-13 | 2022-04-22 | 支付宝(杭州)信息技术有限公司 | Machine learning method, privacy code determination method, device and electronic equipment |
CN113159085A (en) * | 2020-12-30 | 2021-07-23 | 北京爱笔科技有限公司 | Training of classification model, image-based classification method and related device |
CN116028829A (en) * | 2021-01-20 | 2023-04-28 | 国义招标股份有限公司 | Correction clustering processing method, device and storage medium based on transmission step length adjustment |
CN116028829B (en) * | 2021-01-20 | 2023-10-24 | 国义招标股份有限公司 | Correction clustering processing method, device and storage medium based on transmission step length adjustment |
CN112966182A (en) * | 2021-03-09 | 2021-06-15 | 中国民航信息网络股份有限公司 | Project recommendation method and related equipment |
CN112966182B (en) * | 2021-03-09 | 2024-02-09 | 中国民航信息网络股份有限公司 | Project recommendation method and related equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108960264A (en) | The training method and device of disaggregated model | |
US11868856B2 (en) | Systems and methods for topological data analysis using nearest neighbors | |
Yu et al. | Hierarchical deep click feature prediction for fine-grained image recognition | |
US10713597B2 (en) | Systems and methods for preparing data for use by machine learning algorithms | |
Das et al. | Automatic clustering using an improved differential evolution algorithm | |
CN106537422A (en) | Systems and methods for capture of relationships within information | |
WO2019011093A1 (en) | Machine learning model training method and apparatus, and facial expression image classification method and apparatus | |
CN109446517A (en) | Reference resolution method, electronic device and computer readable storage medium | |
EP3268870A1 (en) | Systems and methods for predicting outcomes using a prediction learning model | |
CN108804641A (en) | A kind of computational methods of text similarity, device, equipment and storage medium | |
CN108351985A (en) | Method and apparatus for large-scale machines study | |
CN106951825A (en) | A kind of quality of human face image assessment system and implementation method | |
CN102324038B (en) | Plant species identification method based on digital image | |
CN108509982A (en) | A method of the uneven medical data of two classification of processing | |
CN109934615B (en) | Product marketing method based on deep sparse network | |
CN108985929A (en) | Training method, business datum classification processing method and device, electronic equipment | |
Torrente et al. | Initializing k-means clustering by bootstrap and data depth | |
CN107947921A (en) | Based on recurrent neural network and the password of probability context-free grammar generation system | |
NO319838B1 (en) | Improving knowledge discovery from multiple datasets by using multiple support vector machines | |
CN111368926B (en) | Image screening method, device and computer readable storage medium | |
CN106022359A (en) | Fuzzy entropy space clustering analysis method based on orderly information entropy | |
US10733499B2 (en) | Systems and methods for enhancing computer assisted high throughput screening processes | |
CN110110628A (en) | A kind of detection method and detection device of frequency synthesizer deterioration | |
CN113516019A (en) | Hyperspectral image unmixing method and device and electronic equipment | |
Jaffel et al. | A symbiotic organisms search algorithm for feature selection in satellite image classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181207 |