CN110008990A - More classification methods and device, electronic equipment and storage medium - Google Patents

More classification methods and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110008990A
CN110008990A CN201910134159.5A CN201910134159A CN110008990A CN 110008990 A CN110008990 A CN 110008990A CN 201910134159 A CN201910134159 A CN 201910134159A CN 110008990 A CN110008990 A CN 110008990A
Authority
CN
China
Prior art keywords
training
disaggregated model
sample
classification
training set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910134159.5A
Other languages
Chinese (zh)
Inventor
郁延书
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Lazhasi Information Technology Co Ltd
Original Assignee
Shanghai Lazhasi Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Lazhasi Information Technology Co Ltd filed Critical Shanghai Lazhasi Information Technology Co Ltd
Priority to CN201910134159.5A priority Critical patent/CN110008990A/en
Publication of CN110008990A publication Critical patent/CN110008990A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/12Hotels or restaurants

Abstract

The present embodiments relate to machine learning techniques field, a kind of more classification methods and device, electronic equipment and storage medium are disclosed.This method comprises: generating initial training collection, got based on initial training training to the first disaggregated model;By the training set of the corresponding training sample of the first disaggregated model isolated second disaggregated model from training set;Training set training based on the second disaggregated model obtains the second disaggregated model;It repeats above-mentioned separation training sample and training obtains disaggregated model, to obtain multiple disaggregated models.Embodiment of the present invention improve training speed can with trained increasing for classification is completed, to improve whole training effectiveness.

Description

More classification methods and device, electronic equipment and storage medium
Technical field
The present invention relates to machine learning techniques field, in particular to a kind of more classification methods and device, electronic equipment and deposit Storage media.
Background technique
The Internet services such as platform are taken out in recent years to quickly grow, and are brought great convenience for people's lives.Currently, Platform service amount is very huge and rapid growth with business, and machine learning techniques, which become, continues to optimize platform service matter The key of amount.Platform generally classifies to the Main Management range of trade company in platform using more sorting techniques now, so as to for User provides more matched service trade company.Trade company's business scope is summarized according to the commodity and service of taking out the offer of platform trade company The business scope of the trade company arrived, such as " homely side fish ", " rice served with meat and vegetables on top ", " spicy soup ".Some trade companies may have homely side fish, lid The business scope of multiple classifications such as meal, packed meal box lunch, congee.The Main Management range of trade company is to each trade company from multiple classifications The business scope of one of classification is marked in business scope as most important business scope.
Inventor find the relevant technologies the prior art has at least the following problems: the Main Management range of existing platform trade company more classification It needs training to obtain the disaggregated model of multiple classifications in method, is instruction when training obtains the disaggregated model of each classification Practice and the sample of the category is concentrated to be classified as positive sample, remaining sample obtains a disaggregated model, such k classification as negative sample training More classification methods just need to construct k disaggregated model.And since the training of the disaggregated model of each classification is by training set Middle whole sample can sharply slow down as training sample as the quantity of training sample is continuously increased training speed.
Summary of the invention
Embodiment of the present invention is designed to provide a kind of more classification methods and device, electronic equipment and storage medium, So that training speed can be improved with trained increasing for classification is completed, to improve whole training effectiveness.
In order to solve the above technical problems, embodiments of the present invention provide a kind of more classification methods, comprising:
Initial training collection is generated, is got based on initial training training to the first disaggregated model;
By the training of the corresponding training sample of first disaggregated model isolated second disaggregated model from training set Collection;
Training set training based on second disaggregated model obtains the second disaggregated model;
It repeats above-mentioned separation training sample and training obtains disaggregated model, to obtain multiple disaggregated models.
Embodiments of the present invention additionally provide a kind of more sorters, comprising:
Generation module, for generating initial training collection;
First training module, for being got based on initial training training to the first disaggregated model;
Second training module, for by the corresponding training sample of first disaggregated model isolated the from training set The training set of two disaggregated models, and the second disaggregated model is obtained based on the training of the training set of second disaggregated model;
Control module calls second training module to execute separation training sample and training acquisition classification mould for repeating Type, to obtain multiple disaggregated models.
Embodiments of the present invention additionally provide a kind of electronic equipment, comprising: memory and processor, memory storage meter Calculation machine program, processor run the computer program to realize:
Initial training collection is generated, is got based on initial training training to the first disaggregated model;
By the training of the corresponding training sample of first disaggregated model isolated second disaggregated model from training set Collection;
Training set training based on second disaggregated model obtains the second disaggregated model;
It repeats above-mentioned separation training sample and training obtains disaggregated model, to obtain multiple disaggregated models.
Embodiments of the present invention additionally provide a kind of storage medium, for storing computer-readable program, the calculating Machine readable program is used to execute foregoing more classification methods for computer.
Embodiment of the present invention in terms of existing technologies, training obtain multiple disaggregated models during, in base It gets in initial training training to after the first disaggregated model, the corresponding training sample of the first disaggregated model is divided from training set From obtaining the training set of the second disaggregated model, and the second disaggregated model is obtained based on the training of the training set of the second disaggregated model, weight Above-mentioned separation training sample is executed again and training obtains disaggregated model, to finally obtain multiple disaggregated models.Due to the first classification The training of disaggregated model after model, which is all based on, will train the corresponding training sample of disaggregated model of completion from training set In separate rear remaining training sample, so the increase of the quantity with the disaggregated model for train completion, training set In training sample quantity it is fewer and fewer, therefore the training speed of disaggregated model is getting faster, so as to improve more classification The training effectiveness of model.
As one embodiment, the multiple classification is obtained after the generation initial training collection, and starting training Before model, further includes:
The initial training is calculated and concentrates the distance between different classes of training sample, institute is determined according to the distance State the learning sequence of multiple disaggregated models;
In the step of training obtains the multiple disaggregated model, according to the learning sequence training of the multiple disaggregated model Obtain the multiple disaggregated model.
As one embodiment, the distance is Euclidean distance, described to determine the multiple classification mould according to the distance The learning sequence of type, specifically includes:
The initial training is calculated to concentrate between the center of a sample of each classification and the center of a sample of other classifications The minimum value of Euclidean distance;
Using the sequence of the minimum value of the corresponding Euclidean distance of each classification from big to small as the multiple disaggregated model Learning sequence.
As one embodiment, the generation initial training collection is specifically included:
Obtain original training set;
It constructs to obtain the initial training collection based on the original training set.
As one embodiment, more classification methods are used for the classification of trade company's Main Management range.
It is described to obtain original training set as one embodiment, it specifically includes:
Acquisition is manually labeled with the merchant information of Main Management range class;
Main Management model by the corresponding trade company's number of the Main Management range class marked under same brand at most and greater than 1 Enclose the Main Management range as all trade companies under the brand.
It is described to construct to obtain the initial training collection based on the original training set as one embodiment, it specifically includes:
Sample over-sampling or sample lack sampling are carried out to be balanced training set to the original training set;
The initial training collection is obtained based on the balance training collection.
It is described to construct to obtain the initial training collection based on the original training set as one embodiment, it specifically includes:
Extract the output data of the trade company in the original training set;
Trade company's output dictionary is constructed according to the output data, and each trade company in trade company's output dictionary is corresponding Output dictionary is converted into feature vector, to obtain the initial training collection.
As one embodiment, the disaggregated model is supporting vector machine model or Logic Regression Models.
As one embodiment, after obtaining multiple disaggregated models, further includes:
Forecast sample is identified by the multiple disaggregated model one by one, until identification obtains the forecast sample Classification.
Detailed description of the invention
Fig. 1 is the flow chart of the more classification methods of first embodiment according to the present invention;
Fig. 2 is the classification schematic diagram of the more classification methods of first embodiment according to the present invention;
Fig. 3 is the flow chart of the more classification methods of second embodiment according to the present invention;
Fig. 4 is the flow chart of the more classification methods of third embodiment according to the present invention;
Fig. 5 is the structural schematic diagram of the 4th more sorters of embodiment according to the present invention;
Fig. 6 is the structural schematic diagram of the 5th embodiment electronic equipment according to the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention Each embodiment be explained in detail.However, it will be understood by those skilled in the art that in each embodiment party of the present invention In formula, many technical details are proposed in order to make reader more fully understand the present invention.But even if without these technical details And various changes and modifications based on the following respective embodiments, claimed technical solution of the invention also may be implemented.
The first embodiment of the present invention is related to a kind of more classification methods.This method comprises: generating initial training collection, it is based on Initial training training is got to the first disaggregated model;The corresponding training sample of first disaggregated model is isolated from training set The training set of second disaggregated model;Training set training based on the second disaggregated model obtains the second disaggregated model;It repeats It states separation training sample and training obtains disaggregated model, to obtain multiple disaggregated models.Embodiment of the present invention is relative to existing For technology, during training obtains multiple disaggregated models, get based on initial training training to the first disaggregated model Later, by the training set of the corresponding training sample of the first disaggregated model isolated second disaggregated model from training set, and base The second disaggregated model is obtained in the training set training of the second disaggregated model, above-mentioned separation training sample is repeated and training obtains Disaggregated model, to finally obtain multiple disaggregated models.Since the training of the disaggregated model after the first disaggregated model is all based on The corresponding training sample of disaggregated model completed will have been trained to separate rear remaining training sample from training set, so With the increase for the quantity for having trained the disaggregated model completed, the quantity of the training sample in training set is fewer and fewer, therefore point The training speed of class model is getting faster, so as to improve the training effectiveness of more disaggregated models.
It is described in detail below with reference to more classification methods of the Fig. 1 to present embodiment.The method comprising the steps of 101 to step Rapid 105.
Step 101: generating initial training collection.
Wherein, initial training collection can be the Main Management range class for obtaining food and drink etc. for training and taking out the trade company of platform The training set of other more disaggregated models, it is so without being limited thereto.Initial training collection can be obtained by screening sample and feature extraction.
Step 101 specifically includes: obtaining original training set, constructs to obtain initial training collection based on original training set.Specifically Ground, since the Main Management range accuracy rate of the trade company manually marked on line is not high, if directly participated in as training set Model training, which can introduce noise to disaggregated model, leads to error.Therefore, present embodiment carries out screening sample in the following ways To obtain original training set: by the corresponding trade company's number of the Main Management range class marked under same brand at most and greater than 1 Main Management range of the Main Management range as all trade companies under the brand.Specific screening rule is as follows:
Assuming that the Main Management range class of the trade company manually marked is C={ c1,c2,...,cm, certain brand A includes Trade company is S={ s1,s2,...,sn, manually marking dictionary on the corresponding line of all trade companies under brand A is D={ c1:n1,c2: n2,...,cm:nm, D indicate n trade company line on manually mark in n1The Main Management range of a trade company's mark is classification c1, n2 The Main Management range of a trade company's mark is classification c2, and so on, wherein n1+n2+...+nm=n, if n1>n2>...>nm, And n1> 1, then n the high-quality of trade company is labeled as n under brand A1The corresponding Main Management range class c of a trade company1, present embodiment It is middle by classification c1Main Management range as all trade companies under brand A.It screens to obtain all brands based on above-mentioned screening rule Main Management range, training sample is obtained with this.
It constructs to obtain initial training collection based on original training set in present embodiment and specifically include: extracting in original training set Trade company output data, trade company's output dictionary is constructed according to output data, and trade company each in trade company's output dictionary is corresponding Output dictionary be converted into feature vector, to obtain initial training collection.In other words, quotient is extracted after screening to sample The feature at family constructs trade company's eigenmatrix, to obtain initial training collection.Specifically, the output data of trade company can be from screening The trade company extracted in nearly one month trade company's order data afterwards goes out going out for single vegetable name and each vegetable name for nearly one month Dan Liang, it is so without being limited thereto, and trade company's output dictionary is constructed according to the output data of trade company, trade company's output dictionary is, for example, that trade company goes out Word allusion quotation:
OrderDict={ wid1:{dish11:order11,dish12:order12,...},
wid2:{dish21:order21,dish22:order22... } ... },
Wherein, wid1:{dish11:order11,dish12:order12... } and indicate trade company wid1Vegetable in nearly one month dish11Single amount that goes out be order11, vegetable dish12Single amount that goes out be order12, and so on.
Based on trade company go out word allusion quotation OrderDict by each trade company it is corresponding go out individual character allusion quotation be converted into corresponding feature to Amount, to generate feature sparse matrix V=[v1,v2...], wherein v1=[order11,order12...] and indicate trade company wid1 Corresponding individual character allusion quotation wid out1:{dish11:order11,dish12:order12... } and conversion after feature vector, and so on.
Step 102: being got based on initial training training to the first disaggregated model.
Wherein, the classification of the Main Management range of trade company belongs to more classification tasks, and present embodiment divides more classification problems Solution is at a series of two classes classification problem.Classification function between two of them subclass can be support vector machines (Support Vector Machine, SVM) model or Logic Regression Models, it is so without being limited thereto.Step 102 is based on the training of initial training collection It obtains in the first disaggregated model, the corresponding classification of the first disaggregated model is, for example, classification 1, obtains 1 corresponding point of classification in training When class model, the Main Management range flags that initial training can be concentrated to trade company are the sample of classification 1 as positive sample, label For remaining Main Management range class sample standard deviation as negative sample, train to obtain the disaggregated model of classification 1, for example, class with this Other 1 svm classifier model.
Step 103: by the corresponding training sample of the first disaggregated model from training set isolated second disaggregated model Training set.
Wherein, the corresponding training sample of disaggregated model is classification trained sample identical with the classification of disaggregated model in training set This.For example, the training sample that training is collectively labeled as classification 1 is picked after training obtains 1 corresponding disaggregated model of classification Except going out.
Step 104: the training set training based on the second disaggregated model obtains the second disaggregated model.
Due to can be by the category in the training set after training obtains the disaggregated model of a classification in step 103 Corresponding training sample eliminates, so, the number of training in step 104 in the training set of second disaggregated model gradually subtracts It is few.Due to the training sample in present embodiment in training set with trained complete disaggregated model quantity increase without It is disconnected to reduce, so the training speed of remaining disaggregated model is continuously improved.
Step 105: determine whether to have obtained the disaggregated model of preset quantity, if having obtained the disaggregated model of preset quantity, Then terminate to train, if the disaggregated model of not up to preset quantity, returns to step 103 to step 105.
Wherein, preset quantity is that the classification number of trade company's Main Management range subtracts 1.If point of trade company's Main Management range Class number is C, then preset quantity is C-1.After step 104 is repeatedly executed C-2 times, C-1 disaggregated model can be obtained, Therefore it can be counted by the execution number to step 104 to determine whether to have obtained the disaggregated model of preset quantity, so It is without being limited thereto.
After step 105, further includes: forecast sample is identified by multiple disaggregated models one by one, until identification Obtain the classification of forecast sample.Referring to Fig. 2, it is true to first pass through disaggregated model SVM1 when carrying out classification identification to forecast sample Determine whether forecast sample belongs to classification 1, if being not belonging to, continues through disaggregated model SVM2 and determine whether default sample belongs to class Other 2, and so on, until marking out forecast sample generic.Since the amount total of the SVM model in present embodiment is C-1, therefore at most can recognize by C-1 prediction and obtain forecast sample generic.
Present embodiment is by will train the corresponding training sample of obtained disaggregated model constantly to separate from training set It goes out, and training obtains remaining disaggregated model one by one based on remaining training sample.Due to the classification for having trained completion Disaggregated model increase, the training sample in training set constantly reduces, therefore the training speed of remaining disaggregated model constantly mentions Height, so as to improve whole training effectiveness.
Second embodiment of the present invention is related to a kind of more classification methods, base of the second embodiment in first embodiment Make improvement on plinth, mainly the improvement is that, in this second embodiment, start training obtain multiple disaggregated models it Before, the learning sequence of multiple disaggregated models is determined previously according to initial training collection, and can improve and divide by determining learning sequence The accuracy of class model.
Referring to Fig. 3, more classification methods of present embodiment include step 301 to step 306.
Step 301: generating initial training collection.
Step 301 can refer to the step 101 in first embodiment, and details are not described herein again.
Step 302: initial training is calculated and concentrates the distance between different classes of training sample, it is more according to distance determination The learning sequence of a disaggregated model.
Wherein, the distance can be Euclidean distance, so without being limited thereto, in some instances, can also be according to Manhattan Distance, Min Shi distance, COS distance etc. determine the learning sequence of multiple disaggregated models.It is multiple according to distance determination in step 301 The learning sequence of disaggregated model, which specifically includes, is calculated center of a sample and other classifications that each classification is concentrated in initial training The minimum value of Euclidean distance between center of a sample, with the sequence of the minimum value of the corresponding Euclidean distance of each classification from big to small Learning sequence as multiple disaggregated models.Wherein, the calculation formula of the center of a sample of classification is as follows:
Wherein, XiIt is the feature vector of all samples of trade company Main Management range class i, niIt is trade company's Main Management model Enclose the sample number of classification i, miIndicate the center of trade company's Main Management range class i.
The calculation formula of Euclidean distance between the center of a sample of some classification and the center of a sample in addition to the category is such as Under:
Dij=| | mi-mj||;
Wherein, DijIndicate the Euclidean distance between the center of a sample of classification i and the center of a sample of classification j.liIndicate classification i The minimum value of Euclidean distance between the center of a sample of any classification in addition and the center of a sample of classification i:
By C different classes of Euclidean distance minimum value li(i=1,2 ..., C) sorts according to sequence from big to small, And using this sequentially as the learning sequence of C-1 disaggregated model.I.e. preferentially training obtains the maximum classification of Euclidean distance minimum value Multiple disaggregated models, the corresponding training sample of the disaggregated model are also preferentially separated from training set.Since Euclidean distance is anti- The similarity between different classes of training sample is reflected, so model can be improved using the learning sequence that present embodiment determines Accuracy.
The step 102 of step 303 to step 306 and first embodiment is essentially identical to step 105, and the main distinction is In step 303 to step 306 learning sequence of multiple disaggregated models be step 302 in determine learning sequence, and step 102 to The learning sequence of multiple disaggregated models can be learning sequence determining at random in step 105.
Present embodiment is by will train the corresponding training sample of obtained disaggregated model constantly to separate from training set It goes out, and next disaggregated model is obtained based on the training of remaining training sample.Due to the classification for having trained the classification completed Model increases, and the training sample in training set is constantly reduced, therefore the training speed of remaining disaggregated model is continuously improved, thus Whole training effectiveness can be improved.Also, the learning sequence of the disaggregated model of present embodiment is according to different classes of Europe The sequence of the minimum value of family name's distance from big to small is trained, with the disaggregated model phase obtained according to the training of random learning sequence Than having more preferably accuracy.
Third embodiment of the present invention is related to a kind of more classification methods, and third embodiment is implemented first or second Improvement is made on the basis of mode, is mainly theed improvement is that, in the third embodiment, passes through the original training set to acquisition Sample over-sampling or sample lack sampling are carried out to be balanced training set, and construct to obtain initial training based on balance training collection Collection so that initial training concentrates different classes of training sample more to balance, and then improves the accuracy of disaggregated model.
Referring to Fig. 4, the method comprising the steps of 401 to step 406.
Step 401: obtaining original training set.
The acquisition modes of the original training set of step 401 can be with reference to the initial training of the step 101 of first embodiment The acquisition modes of collection, details are not described herein again.
Step 402: sample over-sampling or sample lack sampling are carried out to be balanced training set, based on flat to original training set Weighing apparatus training set obtains initial training collection.
Since that there are sample class is unbalanced for the training set that acquires in the way of the step 101 of first embodiment The problem of.Sample class is unbalanced to be referred in classification learning algorithm, and the sample size accounting of different classes of sample differs greatly, this Great interference can be caused to learning process.For example, possessing in the data set of 1000 data samples, the data volume of certain class sample is only There are 10, far fewer than the data volume of the sample of other classifications, such situation is that sample class is unbalanced.Sample class is unbalanced It is very few to will lead to the feature that the few classification of sample size is included, is difficult therefrom to extract rule;Even if obtaining disaggregated model, also it is easy Generation depend on unduly with overfitting problem caused by limited data sample, cause the accuracy of model poor.For another example: former The Main Management range of trade company is labeled as the common label classification such as " homely side fish ", " rice served with meat and vegetables on top ", " Sichuan cuisine " in beginning training set Ratio be far longer than and be labeled as the other ratios of uncommon tag class such as " French food ", " Australian dish ", therefore there are samples This classification is unbalanced, and the accuracy that will cause model is poor.It therefore, can be by carrying out sample to original training set in step 402 This over-sampling or the processing of sample lack sampling, to improve the imbalance problem between sample different classes of in original training set, To obtain the training set more balanced, so that model is more acurrate.
Wherein, sample over-sampling specifically includes: the Main Management for being less than the sample of preset quantity in original training set Range class, carries out random over-sampling processing, i.e., the random sampling from the category, then will sample that sampling is got be added to it is original In training set, the number of samples of the Main Management range class is finally made to reach preset quantity, preset quantity is, for example, 2000 It is a, it is so without being limited thereto.For being more than the Main Management range class of the sample of preset quantity in original training set, owed at random Sampling processing weeds out from randomly selecting some samples in the category from original training set, finally makes the Main Management The number of samples of range class equally reaches preset quantity.Therefore, random over-sampling is carried out to original training set or owed at random Balance training collection can be obtained in sampling processing.Wherein, obtaining initial training collection based on balance training collection can refer to the first embodiment party The step 101 of formula, details are not described herein again.
Step 403 is identical to step 105 as the step 102 of first embodiment respectively to step 406, no longer superfluous herein It states.In one example, between step 402 and step 403, initial training can also be calculated and concentrate different classes of instruction Practice the distance between sample, the learning sequence of multiple disaggregated models is determined according to distance, it is trained to step 406 in step 403 During multiple disaggregated models, multiple disaggregated models are obtained according to the learning sequence training of multiple disaggregated models, are specifically asked With reference to the step 302 of second embodiment, details are not described herein again.
Present embodiment is by will train the corresponding training sample of obtained disaggregated model constantly to separate from training set It goes out, and next disaggregated model is obtained based on the training of remaining training sample.Due to the classification for having trained the classification completed Model increases, and the training sample in training set is constantly reduced, therefore the training speed of remaining disaggregated model is continuously improved, thus Whole training effectiveness can be improved.Also, present embodiment is by carrying out sample over-sampling or sample to original training set Lack sampling handle so that the balance of initial training collection more preferably so that the obtained disaggregated model of training is more acurrate.
4th embodiment of the invention is related to a kind of more sorters, can be used for the classification of trade company's Main Management range, It is so without being limited thereto.Referring to Fig. 5, more sorters 500 of present embodiment include:
Generation module 501, for generating initial training collection;
First training module 502, for being got based on initial training training to the first disaggregated model;
Second training module 503, for by the corresponding training sample of the first disaggregated model isolated the from training set The training set of two disaggregated models, and the second disaggregated model is obtained based on the training of the training set of the second disaggregated model;
Control module 504, for repeating, the second training module 503 of calling executes separation training sample and training is classified Model, to obtain multiple disaggregated models.
In one example, generation module 501 can be specifically used for obtaining original training set, be based on original training set structure It builds to obtain initial training collection.Wherein, generation module 501 is specifically used for extracting the output data of the trade company in original training set, root Trade company's output dictionary is constructed according to output data, and the corresponding output dictionary conversion of trade company each in trade company's output dictionary is characterized Vector, to obtain initial training collection.
In one example, generation module 501 can be also used for carrying out original training set sample over-sampling or sample is owed Sampling obtains initial training collection based on balance training collection to be balanced training set.
In practical applications, generation module 501 can be also used for the quotient that acquisition is manually labeled with Main Management range class Family information, the Main Management model by the corresponding trade company's number of the Main Management range class marked under same brand at most and greater than 1 Enclose the Main Management range as all trade companies under brand.
In one example, more sorters 500 can also include determining module (not shown), and determining module is for calculating It obtains initial training and concentrates the distance between different classes of training sample, determine that the training of multiple disaggregated models is suitable according to distance The learning sequence training that sequence, the first training module 502 and the second training module 503 are determined according to determining module obtains multiple points Class model.For example, distance can be Euclidean distance, and determining module can be used for being calculated initial training and concentrate each class The minimum value of Euclidean distance between other center of a sample and the center of a sample of other classifications, with the corresponding Euclidean of each classification away from From minimum value learning sequence of the sequence as multiple disaggregated models from big to small.
In one example, disaggregated model can be supporting vector machine model or Logic Regression Models.
In one example, more sorters 500 can also include labeling module (not shown), and labeling module is for one by one Forecast sample is identified by multiple disaggregated models, until identification obtains the classification of forecast sample.
More sorters of present embodiment by will train the obtained corresponding training sample of disaggregated model constantly from It is separated in training set, and next disaggregated model is obtained based on the training of remaining training sample.Due to trained complete The disaggregated model of classification increase, the training sample in training set constantly reduces, therefore the training speed of remaining disaggregated model It is continuously improved, so as to improve whole training effectiveness.Also, the learning sequence of the disaggregated model of present embodiment be according to The sequence of the minimum value of different classes of Euclidean distance from big to small is trained, and is obtained with according to the training of random learning sequence Disaggregated model compare, there is more preferably accuracy.Also, present embodiment is by carrying out sample over-sampling to original training set Or sample lack sampling handle so that the balance of initial training collection more preferably so that the obtained disaggregated model of training is more quasi- Really.
5th embodiment of the invention is related to a kind of electronic equipment.As shown in fig. 6, the electronic equipment includes: memory 602 and processor 601;
Wherein, the memory 602 is stored with the instruction that can be executed by least one described processor 601, described instruction It is executed by least one described processor 601 to realize: generating initial training collection, got based on initial training training to the One disaggregated model;By the instruction of the corresponding training sample of first disaggregated model isolated second disaggregated model from training set Practice collection;Training set training based on second disaggregated model obtains the second disaggregated model;Repeat above-mentioned separation training sample This simultaneously trains acquisition disaggregated model, to obtain multiple disaggregated models.
One or more processors 601 and memory 602, in Fig. 6 by taking a processor 601 as an example.Processor 601, Memory 602 can be connected by bus or other modes, in Fig. 6 for being connected by bus.Memory 602 is used as one Kind non-volatile computer readable storage medium storing program for executing, it is executable to can be used for storing non-volatile software program, non-volatile computer Program and module.Non-volatile software program, instruction and mould of the processor 601 by operation storage in the memory 602 Block realizes above-mentioned more classification methods thereby executing the various function application and data processing of equipment.
Memory 602 may include storing program area and storage data area, wherein storing program area can store operation system Application program required for system, at least one function.In addition, memory 602 may include high-speed random access memory, may be used also To include nonvolatile memory, a for example, at least disk memory, flush memory device or the storage of other nonvolatile solid states Device.In some embodiments, it includes the memory remotely located relative to processor 601 that memory 602 is optional, these are remote Journey memory can pass through network connection to external equipment.The example of above-mentioned network includes but is not limited to internet, enterprises Net, local area network, mobile radio communication and combinations thereof.
One or more module stores in the memory 602, when being executed by one or more processor 601, holds More classification methods in the above-mentioned any means embodiment of row.
Above equipment can be performed embodiment of the present invention provided by method, have the corresponding functional module of execution method and Beneficial effect, the not technical detail of detailed description in the present embodiment, reference can be made to method provided by embodiment of the present invention.
Present embodiment is by will train the corresponding training sample of obtained disaggregated model constantly to separate from training set It goes out, and next disaggregated model is obtained based on the training of remaining training sample.Due to the classification for having trained the classification completed Model increases, and the training sample in training set is constantly reduced, therefore the training speed of remaining disaggregated model is continuously improved, thus Whole training effectiveness can be improved.Also, the learning sequence of the disaggregated model of present embodiment is according to different classes of Europe The sequence of the minimum value of family name's distance from big to small is trained, with the disaggregated model phase obtained according to the training of random learning sequence Than having more preferably accuracy.Also, present embodiment to original training set progress sample over-sampling or sample by owing to adopt Sample handles the preferable initial training collection of being balanced property, so that the disaggregated model that training obtains is more acurrate.
Sixth embodiment of the invention is related to a kind of non-volatile memory medium, for storing computer-readable program, The computer-readable program is used to execute above-mentioned all or part of embodiment of the method for computer.
That is, it will be understood by those skilled in the art that implement the method for the above embodiments be can be with Relevant hardware is instructed to complete by program, which is stored in a storage medium, including some instructions are to make It obtains an equipment (can be single-chip microcontroller, chip etc.) or processor (processor) executes side described in each embodiment of the present invention The all or part of the steps of method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can store journey The medium of sequence code.
It will be understood by those skilled in the art that the respective embodiments described above are to realize specific embodiments of the present invention, And in practical applications, can to it, various changes can be made in the form and details, without departing from the spirit and scope of the present invention.
The embodiment of the present application discloses a kind of classification method of A1., comprising:
Initial training collection is generated, is got based on initial training training to the first disaggregated model;
By the training of the corresponding training sample of first disaggregated model isolated second disaggregated model from training set Collection;
Training set training based on second disaggregated model obtains the second disaggregated model;
It repeats above-mentioned separation training sample and training obtains disaggregated model, to obtain multiple disaggregated models.
A2. more classification methods as described in a1 after the generation initial training collection, and are starting described in trained obtain Before multiple disaggregated models, further includes:
The initial training is calculated and concentrates the distance between different classes of training sample, institute is determined according to the distance State the learning sequence of multiple disaggregated models;
In the step of training obtains the multiple disaggregated model, according to the learning sequence training of the multiple disaggregated model Obtain the multiple disaggregated model.
A3. more classification methods, the distance are Euclidean distance as described in A2, described described more according to the distance determination The learning sequence of a disaggregated model, specifically includes:
The initial training is calculated to concentrate between the center of a sample of each classification and the center of a sample of other classifications The minimum value of Euclidean distance;
Using the sequence of the minimum value of the corresponding Euclidean distance of each classification from big to small as the multiple disaggregated model Learning sequence.
A4. more classification methods as described in any one of A1 to A3, the generation initial training collection, specifically include:
Obtain original training set;
It constructs to obtain the initial training collection based on the original training set.
A5. more classification methods as described in A4, more classification methods are used for the classification of trade company's Main Management range.
A6. more classification methods as described in a5, it is described to obtain original training set, it specifically includes:
Acquisition is manually labeled with the merchant information of Main Management range class;
Main Management model by the corresponding trade company's number of the Main Management range class marked under same brand at most and greater than 1 Enclose the Main Management range as all trade companies under the brand.
A7. more classification methods as described in A6, it is described to construct to obtain the initial training collection based on the original training set, It specifically includes:
Sample over-sampling or sample lack sampling are carried out to be balanced training set to the original training set;
The initial training collection is obtained based on the balance training collection.
A8. more classification methods as described in a5, it is described to construct to obtain the initial training collection based on the original training set, It specifically includes:
Extract the output data of the trade company in the original training set;
Trade company's output dictionary is constructed according to the output data, and each trade company in trade company's output dictionary is corresponding Output dictionary is converted into feature vector, to obtain the initial training collection.
A9. more classification methods as described in a1, the disaggregated model are supporting vector machine model or Logic Regression Models.
A10. more classification methods as described in a1, after obtaining multiple disaggregated models, further includes:
Forecast sample is identified by the multiple disaggregated model one by one, until identification obtains the forecast sample Classification.
The embodiment of the present application also discloses a kind of more sorters of B1., comprising:
Generation module, for generating initial training collection;
First training module, for being got based on initial training training to the first disaggregated model;
Second training module, for by the corresponding training sample of first disaggregated model isolated the from training set The training set of two disaggregated models, and the second disaggregated model is obtained based on the training of the training set of second disaggregated model;
Control module calls second training module to execute separation training sample and training acquisition classification mould for repeating Type, to obtain multiple disaggregated models.
The embodiment of the present application also discloses C1. a kind of electronic equipment, comprising: memory and processor, memory storage meter Calculation machine program, processor run the computer program to realize:
Initial training collection is generated, is got based on initial training training to the first disaggregated model;
By the training of the corresponding training sample of first disaggregated model isolated second disaggregated model from training set Collection;
Training set training based on second disaggregated model obtains the second disaggregated model;
It repeats above-mentioned separation training sample and training obtains disaggregated model, to obtain multiple disaggregated models.
C2. the electronic equipment as described in C1, the processor are also used to execute more points as described in any one of A2 to A10 Class method.
The embodiment of the present application also discloses a kind of computer readable storage medium of D1., is stored with computer program, the meter Calculation machine program is executed by processor more classification methods as described in any one of A1 to A10.

Claims (10)

1. a kind of more classification methods characterized by comprising
Initial training collection is generated, is got based on initial training training to the first disaggregated model;
By the training set of the corresponding training sample of first disaggregated model isolated second disaggregated model from training set;
Training set training based on second disaggregated model obtains the second disaggregated model;
It repeats above-mentioned separation training sample and training obtains disaggregated model, to obtain multiple disaggregated models.
2. more classification methods according to claim 1, which is characterized in that after the generation initial training collection, and Start to train before obtaining the multiple disaggregated model, further includes:
The initial training is calculated and concentrates the distance between different classes of training sample, it is described more according to the distance determination The learning sequence of a disaggregated model;
In the step of training obtains the multiple disaggregated model, obtained according to the learning sequence training of the multiple disaggregated model The multiple disaggregated model.
3. more classification methods according to claim 2, which is characterized in that the distance is Euclidean distance, described according to institute The learning sequence that distance determines the multiple disaggregated model is stated, is specifically included:
The initial training is calculated and concentrates Euclidean between the center of a sample of each classification and the center of a sample of other classifications The minimum value of distance;
Using the sequence of the minimum value of the corresponding Euclidean distance of each classification from big to small as the training of the multiple disaggregated model Sequentially.
4. more classification methods according to any one of claim 1 to 3, which is characterized in that the generation initial training collection, It specifically includes:
Obtain original training set;
It constructs to obtain the initial training collection based on the original training set.
5. more classification methods according to claim 4, which is characterized in that more classification methods are used for trade company's Main Management The classification of range.
6. more classification methods according to claim 5, which is characterized in that it is described to obtain original training set, it specifically includes:
Acquisition is manually labeled with the merchant information of Main Management range class;
Main Management range by the corresponding trade company's number of the Main Management range class marked under same brand at most and greater than 1 is made For the Main Management range of all trade companies under the brand.
7. more classification methods according to claim 6, which is characterized in that described to construct to obtain based on the original training set The initial training collection, specifically includes:
Sample over-sampling or sample lack sampling are carried out to be balanced training set to the original training set;
The initial training collection is obtained based on the balance training collection.
8. a kind of more sorters characterized by comprising
Generation module, for generating initial training collection;
First training module, for being got based on initial training training to the first disaggregated model;
Second training module is used for the corresponding training sample of first disaggregated model isolated second point from training set The training set of class model, and the second disaggregated model is obtained based on the training of the training set of second disaggregated model;
Control module calls second training module to execute separation training sample and training acquisition disaggregated model for repeating, To obtain multiple disaggregated models.
9. a kind of electronic equipment characterized by comprising memory and processor, memory store computer program, processor The computer program is run to realize:
Initial training collection is generated, is got based on initial training training to the first disaggregated model;
By the training set of the corresponding training sample of first disaggregated model isolated second disaggregated model from training set;
Training set training based on second disaggregated model obtains the second disaggregated model;
It repeats above-mentioned separation training sample and training obtains disaggregated model, to obtain multiple disaggregated models.
10. a kind of storage medium, which is characterized in that for storing computer-readable program, the computer-readable program is used for More classification methods as described in any one of claims 1 to 7 are executed for computer.
CN201910134159.5A 2019-02-22 2019-02-22 More classification methods and device, electronic equipment and storage medium Pending CN110008990A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910134159.5A CN110008990A (en) 2019-02-22 2019-02-22 More classification methods and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910134159.5A CN110008990A (en) 2019-02-22 2019-02-22 More classification methods and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110008990A true CN110008990A (en) 2019-07-12

Family

ID=67165871

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910134159.5A Pending CN110008990A (en) 2019-02-22 2019-02-22 More classification methods and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110008990A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407713A (en) * 2020-10-22 2021-09-17 腾讯科技(深圳)有限公司 Corpus mining method and apparatus based on active learning and electronic device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2539837A1 (en) * 2010-02-24 2013-01-02 Jonathan Edward Bell Ackland Classification system and method
CN105389597A (en) * 2015-12-22 2016-03-09 哈尔滨工业大学 Hyperspectral data multi-classification method based on Chernoff distance and SVM (support vector machines)
CN107133293A (en) * 2017-04-25 2017-09-05 中国科学院计算技术研究所 A kind of ML kNN improved methods and system classified suitable for multi-tag

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2539837A1 (en) * 2010-02-24 2013-01-02 Jonathan Edward Bell Ackland Classification system and method
CN105389597A (en) * 2015-12-22 2016-03-09 哈尔滨工业大学 Hyperspectral data multi-classification method based on Chernoff distance and SVM (support vector machines)
CN107133293A (en) * 2017-04-25 2017-09-05 中国科学院计算技术研究所 A kind of ML kNN improved methods and system classified suitable for multi-tag

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407713A (en) * 2020-10-22 2021-09-17 腾讯科技(深圳)有限公司 Corpus mining method and apparatus based on active learning and electronic device
CN113407713B (en) * 2020-10-22 2024-04-05 腾讯科技(深圳)有限公司 Corpus mining method and device based on active learning and electronic equipment

Similar Documents

Publication Publication Date Title
CN107578060B (en) Method for classifying dish images based on depth neural network capable of distinguishing areas
CN108509413A (en) Digest extraction method, device, computer equipment and storage medium
CN102289522B (en) Method of intelligently classifying texts
US11600067B2 (en) Action recognition with high-order interaction through spatial-temporal object tracking
CN104636429B (en) Trademark class search method and device
CN109344884A (en) The method and device of media information classification method, training picture classification model
CN107067293A (en) Merchant category method, device and electronic equipment
CN107958270A (en) Classification recognition methods, device, electronic equipment and computer-readable recording medium
JP6897749B2 (en) Learning methods, learning systems, and learning programs
CN105045909B (en) The method and apparatus that trade name is identified from text
CN108897798A (en) Electricity consumption customer service work order classification method, device and electronic equipment
CN105654196A (en) Adaptive load prediction selection method based on electric power big data
CN106529854A (en) Express delivery distribution and receiving system and method based on classification algorithm
US20210081672A1 (en) Spatio-temporal interactions for video understanding
CN107958406A (en) Inquire about acquisition methods, device and the terminal of data
CN109766935A (en) A kind of semisupervised classification method based on hypergraph p-Laplacian figure convolutional neural networks
CN108846695A (en) The prediction technique and device of terminal replacement cycle
CN110457677A (en) Entity-relationship recognition method and device, storage medium, computer equipment
CN111832590B (en) Article identification method and system
CN104091038A (en) Method for weighting multiple example studying features based on master space classifying criterion
CN109492093A (en) File classification method and electronic device based on gauss hybrid models and EM algorithm
CN105117740A (en) Font identification method and device
CN108228684A (en) Training method, device, electronic equipment and the computer storage media of Clustering Model
JPH02238588A (en) Recognizing device
CN108876452A (en) Electricity customers demand information acquisition methods, device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190712

RJ01 Rejection of invention patent application after publication