CN105069470A - Classification model training method and device - Google Patents

Classification model training method and device Download PDF

Info

Publication number
CN105069470A
CN105069470A CN201510456761.2A CN201510456761A CN105069470A CN 105069470 A CN105069470 A CN 105069470A CN 201510456761 A CN201510456761 A CN 201510456761A CN 105069470 A CN105069470 A CN 105069470A
Authority
CN
China
Prior art keywords
sample
epicycle
positive example
disaggregated model
routine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510456761.2A
Other languages
Chinese (zh)
Inventor
叶幸春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201510456761.2A priority Critical patent/CN105069470A/en
Publication of CN105069470A publication Critical patent/CN105069470A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a classification model training method and a device, belonging to the technical field of data processing. The method comprise a step of carrying out model training according to a current round of positive example samples and a current round of negative example samples and obtaining a current round of classification model, a step of using the current round of classification model to classify all samples if the current round of classification model does not satisfies a specified condition and selecting a specified sample in all samples according to a classification result, a step of taking the current round of positive example samples and the specified samples as a next round of positive example samples and determining a next round of negative example samples according to the next round of positive example samples, and a step of continuously executing the above model training and sample processing process according to the next round of positive example samples and the next round of negative example samples until the classification model which satisfies the specified condition is obtained. With the increase of the positive example sample number, the potential positive example samples in the negative example samples decrease, the purity of the negative example samples can be effectively improved, the stability of model obtained through training according to superimposed number of positive example samples and negative example samples, and the classification accuracy is high.

Description

Disaggregated model training method and device
Technical field
The present invention relates to technical field of data processing, particularly a kind of disaggregated model training method and device.
Background technology
Along with the development of infotech, step into large data age at present.Such as, the various service platforms that businessman or enterprise etc. provide by it collect mass users data.Wherein, various useful information is usually concealed in mass data, these information can provide huge help to aspects such as business management, production control, market analysis, engineering design or Science Explorationss usually, and therefore data mining technology receives the very big concern of every field personnel.Wherein, the basic task of data mining is classified to mass data, and usually realize based on the disaggregated model trained Data classification.
Now in the art when train classification models, first choose the positive example sample for model training and negative routine sample.Wherein, positive example sample refers to the sample be labeled in whole samples of training pattern.Such as, positive example sample can be a class crowd with same requirements or interest.Choose in the sample that negative routine sample is not labeled from whole sample, number of samples is consistent with positive example number of samples.Afterwards, carry out taking turns model training according to this positive example sample and this negative routine sample, obtain disaggregated model.
Realizing in process of the present invention, inventor finds that prior art at least exists following problem:
When whole sample size is less, the quantity of positive example sample and negative routine sample reduces thereupon, and some positive example samples may be included in negative routine sample, cause the degree of purity of negative routine sample not high, because the discrimination of positive example sample and negative routine sample is bad, so the disaggregated model stability obtained after one takes turns model training according to such positive example sample and negative routine sample is bad, classification degree of accuracy is lower, even there will be poor fitting phenomenon.
Summary of the invention
In order to solve the problem of prior art, embodiments provide a kind of disaggregated model training method and device.Described technical scheme is as follows:
On the one hand, provide a kind of disaggregated model training method, described method comprises:
Bear routine sample according to epicycle positive example sample and epicycle and carry out model training, obtain epicycle disaggregated model;
If toe fixed condition is discontented with by described epicycle disaggregated model, described epicycle disaggregated model is then utilized to classify to whole sample, in described whole sample, choose specific sample according to classification results, described specific sample is predicted as positive example sample by described epicycle disaggregated model;
Using described epicycle positive example sample and described specific sample as next round positive example sample, determine that described next round bears routine sample according to described next round positive example sample;
Bear routine sample according to described next round positive example sample and described next round, continue to perform above-mentioned model training and sample process process, until be met the disaggregated model of described specified requirements.
On the other hand, provide a kind of disaggregated model trainer, described device comprises:
Model training module, carrying out model training for bearing routine sample according to epicycle positive example sample and epicycle, obtaining epicycle disaggregated model;
Sample process module, if be discontented with toe fixed condition for described epicycle disaggregated model, then utilize described epicycle disaggregated model to classify to whole sample, choose specific sample according to classification results in described whole sample, described specific sample is predicted as positive example sample by described epicycle disaggregated model; Using described epicycle positive example sample and described specific sample as next round positive example sample, determine that described next round bears routine sample according to described next round positive example sample;
Described model training module, for bearing routine sample according to described next round positive example sample and described next round, continues to perform above-mentioned model training process;
Described sample process module, performs above-mentioned sample process process according to next round disaggregated model, until be met the disaggregated model of described specified requirements for continuing.
The beneficial effect that the technical scheme that the embodiment of the present invention provides is brought is:
Carrying out in model training process, bearing routine sample according to epicycle positive example sample and epicycle and carry out model training, obtain epicycle disaggregated model; If toe fixed condition is discontented with by epicycle disaggregated model, then utilizes epicycle disaggregated model to classify to whole sample, in whole sample, choose specific sample according to classification results; Using epicycle positive example sample and specific sample as next round positive example sample, bear routine sample according to next round positive example sample determination next round; Routine sample is born afterwards according to next round positive example sample and next round, continue to perform above-mentioned model training and sample process process, until be met the disaggregated model of specified requirements, after the present invention takes turns model training one, if toe fixed condition is discontented with by the disaggregated model obtained, then choose specific sample based on this disaggregated model to be added in positive example sample, and carry out model training by many wheel iterative process.Along with the continuous increase of positive example sample size, the potential positive example sample comprised in negative routine sample can decline thereupon, effectively can promote the sample degree of purity in negative routine sample, because the discrimination of positive example sample and negative routine sample is better, so the disaggregated model stability obtained after carrying out many wheel model trainings according to the positive example sample and negative routine sample that repeatedly carry out quantity superposition is better, classification degree of accuracy is higher.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, below the accompanying drawing used required in describing embodiment is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the process flow diagram of a kind of disaggregated model training method that the embodiment of the present invention provides;
Fig. 2 is the process flow diagram of a kind of disaggregated model training method that the embodiment of the present invention provides;
Fig. 3 is the structural representation of a kind of disaggregated model trainer that the embodiment of the present invention provides;
Fig. 4 is the structural representation of a kind of server that the embodiment of the present invention provides.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with accompanying drawing, embodiment of the present invention is described further in detail.
Fig. 1 is the process flow diagram of a kind of disaggregated model training method that the embodiment of the present invention provides.See Fig. 1, the method flow that the embodiment of the present invention provides comprises:
101, bear routine sample according to epicycle positive example sample and epicycle and carry out model training, obtain epicycle disaggregated model.
If toe fixed condition is discontented with by 102 epicycle disaggregated models, then utilize epicycle disaggregated model to classify to whole sample, choose specific sample according to classification results in whole sample, specific sample is predicted as positive example sample by epicycle disaggregated model.
103, using epicycle positive example sample and specific sample as next round positive example sample, bear routine sample according to next round positive example sample determination next round.
104, bear routine sample according to next round positive example sample and next round, continue to perform above-mentioned model training and sample process process, until be met the disaggregated model of specified requirements.
The method that the embodiment of the present invention provides, is carrying out in model training process, is bearing routine sample and carry out model training, obtain epicycle disaggregated model according to epicycle positive example sample and epicycle; If toe fixed condition is discontented with by epicycle disaggregated model, then utilizes epicycle disaggregated model to classify to whole sample, in whole sample, choose specific sample according to classification results; Using epicycle positive example sample and specific sample as next round positive example sample, bear routine sample according to next round positive example sample determination next round; Routine sample is born afterwards according to next round positive example sample and next round, continue to perform above-mentioned model training and sample process process, until be met the disaggregated model of specified requirements, after the present invention takes turns model training one, if toe fixed condition is discontented with by the disaggregated model obtained, then choose specific sample based on this disaggregated model to be added in positive example sample, and carry out model training by many wheel iterative process.Along with the continuous increase of positive example sample size, the potential positive example sample comprised in negative routine sample can decline thereupon, effectively can promote the sample degree of purity in negative routine sample, because the discrimination of positive example sample and negative routine sample is better, so the disaggregated model stability obtained after carrying out many wheel model trainings according to the positive example sample and negative routine sample that repeatedly carry out quantity superposition is better, classification degree of accuracy is higher.
Alternatively, bear before routine sample carries out model training according to epicycle positive example sample and epicycle, the method also comprises:
Based on the quantity of epicycle positive example sample, choose epicycle in the residue sample in whole sample except positive example sample and bear routine sample;
In epicycle positive example sample, choose the first sample, bear in routine sample choose the second sample in epicycle, the first sample is consistent with the sample size comprised in the second sample;
Bear routine sample according to epicycle positive example sample and epicycle and carry out model training, comprising:
According to the residue sample in the residue sample in positive example sample except the first sample, negative routine sample except the second sample, carry out model training.
Alternatively, utilize before epicycle disaggregated model classifies to whole sample, the method also comprises:
According to the first sample and the second sample, epicycle disaggregated model is assessed, obtains assessment result;
Judge whether epicycle disaggregated model meets specified requirements according to assessment result;
When assessment result is better than the classification performance index arranged, determine that epicycle disaggregated model meets specified requirements.
Alternatively, in whole sample, choose the appointment sample being classified as positive example sample according to classification results, comprising:
According to classification results, in whole sample, determine prediction positive example sample;
For each sample in prediction positive example sample, be classified as the probability of positive example sample according to classification results determination sample;
In prediction positive example sample, choose the preset number sample that the probability that is classified as positive example sample is the highest;
A preset number sample is defined as specific sample.
Alternatively, bear routine sample according to epicycle positive example sample and epicycle and carry out model training, comprising:
Based on treating training pattern, calculating epicycle positive example sample and epicycle bear the proper vector of routine sample, treat that training pattern is the disaggregated model that last round of training process obtains, and treat that the class categories of training pattern is determined according to the sample characteristics data of configuration;
Bear the proper vector of each sample in routine sample according to the proper vector of each sample in epicycle positive example sample and epicycle, routine sample is born to epicycle positive example sample and epicycle and classifies;
According to sample classification result and the mark result to epicycle positive example sample, optimize the parameters treating training pattern, obtain epicycle disaggregated model.
Above-mentioned all alternatives, can adopt and combine arbitrarily formation optional embodiment of the present invention, this is no longer going to repeat them.
Fig. 2 is the process flow diagram of a kind of disaggregated model training method that the embodiment of the present invention provides.See Fig. 2, the method flow that the embodiment of the present invention provides comprises:
201, in whole sample, first run positive example sample is chosen.
In embodiments of the present invention, sample refers to the part individuality of actual observation or investigation in research.Such as, whole sample can be whole registered users of a certain application, and whole users belonging to a certain area etc., the embodiment of the present invention does not specifically limit this.Wherein, positive example sample, refers in the sample that in two disaggregated models, training pattern is used by the sample of labeling.Also namely, positive example sample is by hand labeled, and its belonging kinds is known.In addition, why be called that two disaggregated models are because its classification results is only "Yes" or "No" two kinds of situations.Two disaggregated models comprise Logic Regression Models, decision-tree model and supporting vector machine model etc.Wherein, seed crowd belongs to the positive example sample of label under line.Seed crowd normally collects under specific transactions scene, refers to crowd product or service to same requirements and interest, and the quantity of seed crowd is few, usually all below 100,000.Such as, in whole registered users of a certain application, the user of same brand automobile is liked just can to belong to same Ziren group.
It should be noted that, why positive example sample is referred to as first run positive example sample by this step, is because may carry out the model training process of many wheel iteration in subsequent process.And each quantity of taking turns positive example sample is all not identical, due to each, to take turns the positive example sample standard deviation that model training process uses not identical, and the positive example sample therefore in order to take turns each is distinguished, and takes the call that first run positive example sample, next round positive example sample are such.Negative routine sample therewith in like manner.
When choosing first run positive example sample in whole sample, because the quality of the sample data for model training is most important, use this kind of labeling data of seed crowd as positive example sample so general.Such as, a seed crowd of same interest characteristics can will be had as first run positive example sample in whole registered user, also can using the seed crowd of a certain newly-increased service or product that employs as first run positive example sample, the embodiment of the present invention does not specifically limit this, can choose different seed crowds as first run positive example sample based on different classification demands.Wherein, the sample standard deviation in each seed crowd manually carries out choosing and marking in advance, and the embodiment of the present invention does not specifically limit this.
202, based on the quantity of first run positive example sample, choose the first run in the residue sample in whole sample except first run positive example sample and bear routine sample.
Wherein, negative routine sample, refers in the sample that in two disaggregated models, training pattern is used not by the sample of labeling.Also namely, the sample standard deviation in positive example sample is labeled, and classification is clear and definite.Sample standard deviation in negative routine sample is not labeled, and classification is unknown.Lift a simple case, if positive example sample is the student that in a class, part is marked as schoolgirl, so negative routine sample just refers to the student be not labeled in this class, and both may have schoolgirl in the student be not labeled, and also may have schoolboy.
In embodiments of the present invention, after have chosen first run positive example sample, also the first run chosen in whole sample for model training need bear routine sample.Wherein, the quantity of first run positive example sample is consistent with the quantity that the first run bears routine sample.When carrying out the first run and bearing choosing of routine sample, first in whole sample, reject first run positive example sample, obtain remaining sample.Afterwards, in residue sample, carry out sample at random and choose, choose the sample that the quantity of same first run positive example sample is consistent, these samples are born routine sample as the first run.
203, bear in routine sample at first run positive example sample and the first run and choose retain sample.
Wherein, retain sample refers to the sample in subsequent process, the disaggregated model trained being carried out to testing evaluation.
In embodiments of the present invention, first run positive example sample and the first run bear choose retain sample in routine sample time, following manner can be taked to realize: in first run positive example sample, choose the first sample, bear in routine sample in the first run and choose the second sample.Wherein, the first sample is consistent with the sample size comprised in the second sample.Also namely, bear in routine sample in first run positive example sample and the first run and choose the identical sample of quantity together as retain sample.Wherein, the quantity of the first sample and the second sample is generally first run positive example sample and the first run bears 30% of routine sample size.That is, in first run positive example sample, 30% sample is chosen, as the first sample; Bear in routine sample in the first run and choose 30% sample, as the second sample; Using the first sample together with the second sample as retain sample.
204, the residue sample born in routine sample except retain sample according to first run positive example sample and the first run carries out model training, obtains first run disaggregated model.
In embodiments of the present invention, have chosen retain sample owing to bearing in routine sample at first run positive example sample and the first run, so when bearing routine sample according to first run positive example sample and the first run and carrying out model training, also need to reject above-mentioned retain sample.Also, namely, when carrying out model training, only bearing the residue sample in routine sample except the second sample according to the residue sample in first run positive example sample except the first sample, the first run, carrying out model training.
When carrying out model training, can realize with reference to following manner:
Parameters in the first step, initialization first run disaggregated model.
Due to be first time carry out model training, so also need the parameters in first initialization disaggregated model.The model training mentioned in the embodiment of the present invention may be a successive ignition process, and non-carry out model training for the first time time, without the need to performing this step, directly can perform following second step based on the last round of disaggregated model obtained.This step is only for first time model training process.
Wherein, disaggregated model is a kind of mapping being input to output in itself, it can learn the mapping relations between a large amount of constrained input, and without any need for the accurate mathematic(al) representation between input and output, only by known pattern to the training of preliminary classification model, the disaggregated model obtained just has the mapping ability between inputoutput pair.Before beginning train classification models, all parameters all should carry out initialization by some different little random numbers.In disaggregated model training process, the parameters that stochastic gradient descent or back-propagating method are come in Optimum Classification model can be used, thus minimize error in classification as much as possible.The embodiment of the present invention does not specifically limit this.
Second step, based on initialized disaggregated model, calculate the proper vector that first run positive example sample and the first run bear routine sample.
Wherein, when carrying out model training, in order to the class categories of clear and definite disaggregated model, also can obtain the sample characteristics data of configured in advance, determining the classification feature of disaggregated model to be trained according to these sample characteristics data.Wherein, sample characteristics data specify and go out based on positive example sample and negative routine sample training the sorter which kind of classification feature is have.Such as, positive example sample be a certain social activity application registered user in the age 20-30 year user.Because the user in positive example sample is electric commercial family, and the age is less, whether so like certain a network game to predict according to this positive example sample to 20-30 year young man, predicting the outcome, it is more accurately than whether doting on fund class treasury management services according to this positive example sample to 20-30 year young man to be certain to.So the classification feature of disaggregated model can be specified by sample characteristics data.
In embodiments of the present invention, after parameters in initialization disaggregated model, because disaggregated model is a kind of mapping being input to output in itself, so for a disaggregated model, input a training sample to disaggregated model, disaggregated model just can calculate the proper vector of this training sample.It should be noted that, for the non-process of model training first, directly based on last round of disaggregated model, calculating epicycle positive example sample and epicycle bear the proper vector of routine sample.
3rd step, bear the proper vector of each sample in routine sample according to the proper vector of each sample in epicycle positive example sample and epicycle, routine sample is born to epicycle positive example sample and epicycle and classifies.
For this step, for any two training samples, the proper vector distance on feature space of the two is nearer, and illustrate that two training samples are more similar, it is higher that the two belongs to of a sort probability.Wherein, proper vector can be tens dimensions or hundreds of dimension, and the embodiment of the present invention does not specifically limit this.When classifying to whole sample according to proper vector, can realize according to the distance between proper vector, the embodiment of the present invention does not specifically limit this.
4th step, according to sample classification result and the mark result to first run positive example sample, optimize and treat the parameters of training pattern to obtain epicycle disaggregated model.
For this step, the training process of disaggregated model is the process of a parameter successive optimization.First run positive example sample and the first run born in routine sample after the residue sample removed outside retain sample classifies at feature based vector, can judge that whether the disaggregated model that initial training goes out is correct to the classification of sample based on first run positive example sample.Also namely, the gap between the concrete class belonged to according to sample and prediction classification constantly adjusts the parameter of disaggregated model, and the parameters step by step in Optimum Classification model, obtains disaggregated model.
205, according to retain sample, first run disaggregated model is assessed.
After obtaining first run disaggregated model, in order to detect the classification performance of first run disaggregated model, also need to assess first run disaggregated model according to retain sample.Wherein, retain sample includes the first sample coming from positive example sample, comes from the second sample of negative routine sample.
According to the first sample and described second sample, when first run disaggregated model is assessed, the classification accuracy of first run disaggregated model, recall rate, AUC (AreaUnderROCCurve, the area under Receiver operating curve) etc. index can be assessed.Wherein, classification accuracy, refers to the ratio be classified as in such other sample shared by the correct sample of actual classification.That is, certain classification in the classification of classification accuracy correspondence, molecule is the sample size that this classification of prediction is correct, denominator is the quantity being predicted as such other whole sample, the evaluation of its to be disaggregated model by sample predictions be accuracy of some classifications, be worth larger, model prediction accuracy is higher.Recall rate is also referred to as recall ratio, refers to the ratio being accounted for all retain samples by the sample of correctly classifying.
AUC is a kind of standard being used for measuring disaggregated model quality.ROC (ReceiverOperatingCharacteristic, Receiver operating curve), its main analytical tools is a curve ROCcurve being drawn on two dimensional surface.The horizontal ordinate of plane is FPR (FalsePositiveRate), and ordinate is TPR (truepositiverate).For disaggregated model, a TPR and FPR point can be obtained according to its performance on retain sample right.Like this, this sorter just can be mapped to a point in ROC plane.The threshold value used when adjusting the classification of this sorter, can obtain a process (0,0), the curve of (1,1), the ROC curve of Here it is disaggregated model.Generally, this curve all should be in (0,0) and the top of (1,1) line.Because what in fact the ROC curve that (0,0) and (1,1) line is formed represented is a probabilistic classifier.Although the performance carrying out presentation class device with ROCcurve is very intuitively handy., people always wish to have a numerical value to carry out the quality of label category device, so AUC has occurred.The value of AUC is exactly the size of the part area be in below ROCcurve.Usually, the value of AUC is between 0.5 to 1.0, and larger AUC represents good performance.That is, be worth more that large-sized model is more perfect, between different sample, specificity is more remarkable, higher to sample area calibration.
Wherein, every classification performance index of disaggregated model can be set in advance.Such as, accuracy rate is greater than 90%, and recall rate is greater than 97%, AUC value and is greater than 0.8 etc., and the embodiment of the present invention does not specifically limit this.After first run training pattern is assessed, if property indices is all better than the classification performance index arranged in advance in the assessment result obtained, then determine that first run disaggregated model meets specified requirements.If at least one performance index are lower than the classification performance index arranged in advance in the assessment result obtained, then determine that toe fixed condition is discontented with by first run disaggregated model.In addition, when judging whether the assessment result obtained meets specified requirements, also can judge whether the property indices of assessment result no longer promotes.Also namely, no matter a numerical value is all maintained through how many iterative process classification accuracy, recall rate, AUC etc. of taking turns constant.The type of the embodiment of the present invention to this specified requirements does not specifically limit.
If the assessment result of 206 first run disaggregated models is discontented with toe fixed condition, then utilizes first run disaggregated model to classify to whole sample, in whole sample, choose specific sample according to classification results.
In the disclosed embodiments, when the assessment result of first run disaggregated model is discontented with toe fixed condition, also need again to carry out model training.Before upper once model training, first in whole sample, choose next round positive example sample based on first run training pattern and next round bears routine sample.Wherein, next round positive example sample is the superposition of first run positive example sample and specific sample, namely extends the quantity of positive example sample in next round iterative process.
Wherein, when choosing specific sample according to classification results in whole sample, following manner can be taked to realize:
According to classification results, in whole sample, determine prediction positive example sample; For each sample in prediction positive example sample, determine that this sample is classified as the probability of positive example sample according to classification results; In prediction positive example sample, choose the preset number sample that the probability that is classified as positive example sample is the highest; This preset number sample is defined as specific sample.
Wherein, predict that positive example sample is the sample selected in whole sample according to first run disaggregated model.For these samples, first run disaggregated model all predicts that it is the sample with first run positive example sample with similar or identical feature.But each sample there are differences again with the similarity degree of first run positive example sample in prediction positive example sample.In prediction positive example sample each sample standard deviation corresponding one with the similar probable value of first run positive example sample.Just this probable value is comprised in the classification results that first run disaggregated model exports.Represent two samples with numerical value 1 completely the same, it is example that numerical value 0 represents two samples completely inconsistent, then the probable value that different prediction positive example sample is corresponding can be 0.6,0.8,0.87,0.95 etc.Probable value is larger, illustrates that this prediction positive example sample is more close with the feature of first run positive example sample.Generally, the quantity of prediction positive example sample, can be several times as much as the quantity of first run positive example sample.The quantity of such side light seed crowd, compared to still extremely little whole sample size, therefore also needs in magnanimity crowd, to excavate the crowd with it with same characteristic features according to seed crowd and disaggregated model.
In order to choose the specific sample for expanding first run positive example sample in prediction positive example sample, also need to sort to prediction positive example sample according to probable value.Such as, can arrange according to probable value order from big to small.After prediction positive example sample is sorted, probable value can be chosen and be arranged in topN sample above, using this topN sample as specific sample.
207, using first run positive example sample and specific sample as next round positive example sample.
After selecting specific sample according to above-mentioned steps 206, in first run positive example that this specific sample is added to sample, obtain next round positive example sample, to expand the quantity of positive example sample, reach the object of the more samples similar to first run positive example sample characteristics height of collection.
208, above-mentioned steps 202 to step 207 is repeated, until the assessment result of the disaggregated model obtained meets specified requirements.
After obtaining next round positive example sample, the method continued according to above-mentioned steps 202 is chosen next round and is born routine sample.Owing to having carried out quantity expansion to positive example sample, so potential positive example sample just can reduce in negative routine sample, the degree of purity of negative routine sample effectively can be improved.Afterwards, continue to bear routine sample according to next round positive example sample and next round and carry out model training, obtain next round disaggregated model; Continue to assess next round disaggregated model; If the assessment result of next round disaggregated model is discontented with toe fixed condition, then continue to choose specific sample in whole sample, in next round that specific sample is added to disaggregated model, obtain lower whorl positive example sample, repeat above-mentioned steps 202 to 207, until the disaggregated model obtained meets specified requirements.
Such as, a certain electric business under a certain social activity application registration whole users in determine certain interested all user of a hand trip.Under normal circumstances, this electric business the mode such as manually to mark and only can know that a minimum part swims interested user to this hand, i.e. seed crowd.Due to registered user's magnanimity, several necessarily even more than one hundred million, the mode manually chosen and mark is obviously unrealistic, so also need to carry out data mining according to seed crowd in mass users, excavates the potential crowd that Ziren group of the same race has similar features.The disaggregated model training method taking the embodiment of the present invention to provide, just can address this problem well.And due to by successive ignition process to disaggregated model training, and each is taken turns and has carried out quantity expansion to positive example sample standard deviation, so classifying quality is more excellent.The similarity of the positive example sample Ziren of the same race group sorted out is high.Continue for above-mentioned example, the sorting technique taking the embodiment of the present invention to provide accurately can be determined this hand trip other users interested based on this seed crowd in mass users.After excavating this kind of crowd with same interest feature, game advertisement input can be carried out to this part crowd, game products is recommended etc.In addition, opponent swims interested crowd and is usually the young male sex, can also carry out correlated-product recommendation such as automobile, ball sports etc. so accordingly.
The method that the embodiment of the present invention provides, is carrying out in model training process, is bearing routine sample and carry out model training, obtain epicycle disaggregated model according to epicycle positive example sample and epicycle; If toe fixed condition is discontented with by epicycle disaggregated model, then utilize epicycle disaggregated model to classify to whole sample, in whole sample, choose the highest specific sample of the probability that is predicted to be positive example sample according to classification results.Using epicycle positive example sample and specific sample as next round positive example sample, bear routine sample according to next round positive example sample determination next round; Routine sample is born afterwards according to next round positive example sample and next round, continue to perform above-mentioned model training and sample process process, until be met the disaggregated model of specified requirements, after the present invention takes turns model training one, if toe fixed condition is discontented with by the disaggregated model obtained, then choose specific sample based on this disaggregated model to be added in positive example sample, and carry out model training by many wheel iterative process.Along with the continuous increase of positive example sample size, the potential positive example sample comprised in negative routine sample can decline thereupon, effectively can promote the sample degree of purity in negative routine sample, because the discrimination of positive example sample and negative routine sample is better, so the disaggregated model stability obtained after carrying out many wheel model trainings according to the positive example sample and negative routine sample that repeatedly carry out quantity superposition is better, classification degree of accuracy is higher.
Fig. 3 is a kind of disaggregated model trainer that the embodiment of the present invention provides.See Fig. 3, this device comprises: model training module 301, sample process module 302.
Wherein, model training module 301 is connected with sample process module 302, carrying out model training, obtaining epicycle disaggregated model for bearing routine sample according to epicycle positive example sample and epicycle; Sample process module 302, if be discontented with toe fixed condition for epicycle disaggregated model, then utilize epicycle disaggregated model to classify to whole sample, choose specific sample according to classification results in whole sample, specific sample is predicted as positive example sample by epicycle disaggregated model; Using epicycle positive example sample and specific sample as next round positive example sample, bear routine sample according to next round positive example sample determination next round; Model training module 301, for bearing routine sample according to next round positive example sample and next round, continues to perform above-mentioned model training process; Sample process module 302, performs above-mentioned sample process process according to next round disaggregated model, until be met the disaggregated model of specified requirements for continuing.
Alternatively, this device also comprises:
Module chosen by first sample, for the quantity based on epicycle positive example sample, chooses epicycle and bear routine sample in the residue sample in whole sample except positive example sample;
Module chosen by second sample, and for choosing the first sample in epicycle positive example sample, bearing in routine sample choose the second sample in epicycle, the first sample is consistent with the sample size comprised in the second sample;
Model training module, for according to the residue sample in the residue sample in positive example sample except the first sample, negative routine sample except the second sample, carries out model training.
Alternatively, this device also comprises:
Model evaluation module, for according to the first sample and the second sample, assesses epicycle disaggregated model, obtains assessment result; Judge whether epicycle disaggregated model meets specified requirements according to assessment result; When assessment result is better than the classification performance index arranged, determine that epicycle disaggregated model meets specified requirements.
Alternatively, sample process module, for according to classification results, determines prediction positive example sample in whole sample; For each sample in prediction positive example sample, be classified as the probability of positive example sample according to classification results determination sample; In prediction positive example sample, choose the preset number sample that the probability that is classified as positive example sample is the highest; A preset number sample is defined as specific sample.
Alternatively, model training module, for based on treating training pattern, calculates the proper vector that epicycle positive example sample and epicycle bear routine sample, treat that training pattern is the disaggregated model that last round of training process obtains, treat that the class categories of training pattern is determined according to the sample characteristics data of configuration; Bear the proper vector of each sample in routine sample according to the proper vector of each sample in epicycle positive example sample and epicycle, routine sample is born to epicycle positive example sample and epicycle and classifies; According to sample classification result and the mark result to epicycle positive example sample, optimize the parameters treating training pattern, obtain epicycle disaggregated model.
To sum up, the device that the embodiment of the present invention provides, is carrying out in model training process, is bearing routine sample and carry out model training, obtain epicycle disaggregated model according to epicycle positive example sample and epicycle; If toe fixed condition is discontented with by epicycle disaggregated model, then utilizes epicycle disaggregated model to classify to whole sample, in whole sample, choose specific sample according to classification results; Using epicycle positive example sample and specific sample as next round positive example sample, bear routine sample according to next round positive example sample determination next round; Routine sample is born afterwards according to next round positive example sample and next round, continue to perform above-mentioned model training and sample process process, until be met the disaggregated model of specified requirements, after the present invention takes turns model training one, if toe fixed condition is discontented with by the disaggregated model obtained, then choose specific sample based on this disaggregated model to be added in positive example sample, and carry out model training by many wheel iterative process.Along with the continuous increase of positive example sample size, the potential positive example sample comprised in negative routine sample can decline thereupon, effectively can promote the sample degree of purity in negative routine sample, because the discrimination of positive example sample and negative routine sample is better, so the disaggregated model stability obtained after carrying out many wheel model trainings according to the positive example sample and negative routine sample that repeatedly carry out quantity superposition is better, classification degree of accuracy is higher.
It should be noted that: the disaggregated model trainer that above-described embodiment provides is when train classification models, only be illustrated with the division of above-mentioned each functional module, in practical application, can distribute as required and by above-mentioned functions and be completed by different functional modules, inner structure by device is divided into different functional modules, to complete all or part of function described above.In addition, the disaggregated model trainer that above-described embodiment provides and disaggregated model training method embodiment belong to same design, and its specific implementation process refers to embodiment of the method, repeats no more here.
Fig. 4 is a kind of server according to an exemplary embodiment, and this server may be used for the disaggregated model method shown in above-mentioned arbitrary exemplary embodiment of implementing.Specifically: see Fig. 4, this server 400 can produce larger difference because of configuration or performance difference, one or more central processing units (CentralProcessingUnit can be comprised, CPU) 422 (such as, one or more processors) and storer 432, one or more store the storage medium 430 (such as one or more mass memory units) of application program 442 or data 444.Wherein, storer 432 and storage medium 430 can be of short duration storages or store lastingly.The program being stored in storage medium 430 can comprise one or more modules (diagram does not mark).
Server 400 can also comprise one or more power supplys 426, one or more wired or wireless network interfaces 440, one or more IO interface 448, and/or, one or more operating system 441, such as WindowsServer tM, MacOSX tM, Unix tM, Linux tM, FreeBSD tMetc..
More than one or one program is stored in storer, and is configured to be performed by more than one or one processor, and more than one or one routine package is containing the instruction for carrying out following operation:
Bear routine sample according to epicycle positive example sample and epicycle and carry out model training, obtain epicycle disaggregated model;
If toe fixed condition is discontented with by epicycle disaggregated model, then utilize epicycle disaggregated model to classify to whole sample, choose specific sample according to classification results in whole sample, specific sample is predicted as positive example sample by epicycle disaggregated model;
Using epicycle positive example sample and specific sample as next round positive example sample, bear routine sample according to next round positive example sample determination next round;
Bear routine sample according to next round positive example sample and next round, continue to perform above-mentioned model training and sample process process, until be met the disaggregated model of specified requirements.
Alternatively, bear before routine sample carries out model training according to epicycle positive example sample and epicycle, the method also comprises:
Based on the quantity of epicycle positive example sample, choose epicycle in the residue sample in whole sample except positive example sample and bear routine sample;
In epicycle positive example sample, choose the first sample, bear in routine sample choose the second sample in epicycle, the first sample is consistent with the sample size comprised in the second sample;
Bear routine sample according to epicycle positive example sample and epicycle and carry out model training, comprising:
According to the residue sample in the residue sample in positive example sample except the first sample, negative routine sample except the second sample, carry out model training.
Alternatively, utilize before epicycle disaggregated model classifies to whole sample, the method also comprises:
According to the first sample and the second sample, epicycle disaggregated model is assessed, obtains assessment result;
Judge whether epicycle disaggregated model meets specified requirements according to assessment result;
When assessment result is better than the classification performance index arranged, determine that epicycle disaggregated model meets specified requirements.
Alternatively, in whole sample, choose the appointment sample being classified as positive example sample according to classification results, comprising:
According to classification results, in whole sample, determine prediction positive example sample;
For each sample in prediction positive example sample, be classified as the probability of positive example sample according to classification results determination sample;
In prediction positive example sample, choose the preset number sample that the probability that is classified as positive example sample is the highest;
A preset number sample is defined as specific sample.
Alternatively, bear routine sample according to epicycle positive example sample and epicycle and carry out model training, comprising:
Based on treating training pattern, calculating epicycle positive example sample and epicycle bear the proper vector of routine sample, treat that training pattern is the disaggregated model that last round of training process obtains, and treat that the class categories of training pattern is determined according to the sample characteristics data of configuration;
Bear the proper vector of each sample in routine sample according to the proper vector of each sample in epicycle positive example sample and epicycle, routine sample is born to epicycle positive example sample and epicycle and classifies;
According to sample classification result and the mark result to epicycle positive example sample, optimize the parameters treating training pattern, obtain epicycle disaggregated model.
The server that the embodiment of the present invention provides, is carrying out in model training process, is bearing routine sample and carry out model training, obtain epicycle disaggregated model according to epicycle positive example sample and epicycle; If toe fixed condition is discontented with by epicycle disaggregated model, then utilizes epicycle disaggregated model to classify to whole sample, in whole sample, choose specific sample according to classification results; Using epicycle positive example sample and specific sample as next round positive example sample, bear routine sample according to next round positive example sample determination next round; Routine sample is born afterwards according to next round positive example sample and next round, continue to perform above-mentioned model training and sample process process, until be met the disaggregated model of specified requirements, after the present invention takes turns model training one, if toe fixed condition is discontented with by the disaggregated model obtained, then choose specific sample based on this disaggregated model to be added in positive example sample, and carry out model training by many wheel iterative process.Along with the continuous increase of positive example sample size, the potential positive example sample comprised in negative routine sample can decline thereupon, effectively can promote the sample degree of purity in negative routine sample, because the discrimination of positive example sample and negative routine sample is better, so the disaggregated model stability obtained after carrying out many wheel model trainings according to the positive example sample and negative routine sample that repeatedly carry out quantity superposition is better, classification degree of accuracy is higher.
One of ordinary skill in the art will appreciate that all or part of step realizing above-described embodiment can have been come by hardware, the hardware that also can carry out instruction relevant by program completes, described program can be stored in a kind of computer-readable recording medium, the above-mentioned storage medium mentioned can be ROM (read-only memory), disk or CD etc.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. a disaggregated model training method, is characterized in that, described method comprises:
Bear routine sample according to epicycle positive example sample and epicycle and carry out model training, obtain epicycle disaggregated model;
If toe fixed condition is discontented with by described epicycle disaggregated model, described epicycle disaggregated model is then utilized to classify to whole sample, in described whole sample, choose specific sample according to classification results, described specific sample is predicted as positive example sample by described epicycle disaggregated model;
Using described epicycle positive example sample and described specific sample as next round positive example sample, determine that described next round bears routine sample according to described next round positive example sample;
Bear routine sample according to described next round positive example sample and described next round, continue to perform above-mentioned model training and sample process process, until be met the disaggregated model of described specified requirements.
2. method according to claim 1, is characterized in that, describedly bears before routine sample carries out model training according to epicycle positive example sample and epicycle, and described method also comprises:
Based on the quantity of described epicycle positive example sample, choose described epicycle in the residue sample in whole sample except described positive example sample and bear routine sample;
In described epicycle positive example sample, choose the first sample, bear in routine sample choose the second sample in described epicycle, described first sample is consistent with the sample size comprised in described second sample;
Describedly bear routine sample according to epicycle positive example sample and epicycle and carry out model training, comprising:
According to the residue sample in the residue sample in described positive example sample except described first sample, described negative routine sample except described second sample, carry out model training.
3. method according to claim 2, is characterized in that, describedly utilizes before described epicycle disaggregated model classifies to whole sample, and described method also comprises:
According to described first sample and described second sample, described epicycle disaggregated model is assessed, obtains assessment result;
Judge whether described epicycle disaggregated model meets described specified requirements according to described assessment result;
When described assessment result is better than the classification performance index arranged, determine that described epicycle disaggregated model meets described specified requirements.
4. method according to claim 1, is characterized in that, describedly in whole sample, chooses the appointment sample being classified as positive example sample according to classification results, comprising:
According to described classification results, in whole sample, determine prediction positive example sample;
For each sample in prediction positive example sample, determine that described sample is classified as the probability of positive example sample according to described classification results;
In described prediction positive example sample, choose the preset number sample that the probability that is classified as positive example sample is the highest;
A described preset number sample is defined as described specific sample.
5. method according to claim 1, is characterized in that, describedly bears routine sample according to epicycle positive example sample and epicycle and carries out model training, comprising:
Based on treating training pattern, calculate the proper vector that described epicycle positive example sample and described epicycle bear routine sample, describedly treat that training pattern is the disaggregated model that last round of training process obtains, described in treat that the class categories of training pattern is determined according to the sample characteristics data of configuration;
Bear the proper vector of each sample in routine sample according to the proper vector of each sample in described epicycle positive example sample and described epicycle, routine sample is born to epicycle positive example sample and epicycle and classifies;
According to sample classification result and the mark result to described epicycle positive example sample, treat the parameters of training pattern described in optimization, obtain described epicycle disaggregated model.
6. a disaggregated model trainer, is characterized in that, described device comprises:
Model training module, carrying out model training for bearing routine sample according to epicycle positive example sample and epicycle, obtaining epicycle disaggregated model;
Sample process module, if be discontented with toe fixed condition for described epicycle disaggregated model, then utilize described epicycle disaggregated model to classify to whole sample, choose specific sample according to classification results in described whole sample, described specific sample is predicted as positive example sample by described epicycle disaggregated model; Using described epicycle positive example sample and described specific sample as next round positive example sample, determine that described next round bears routine sample according to described next round positive example sample;
Described model training module, for bearing routine sample according to described next round positive example sample and described next round, continues to perform above-mentioned model training process;
Described sample process module, performs above-mentioned sample process process according to next round disaggregated model, until be met the disaggregated model of described specified requirements for continuing.
7. device according to claim 6, is characterized in that, described device also comprises:
Module chosen by first sample, for the quantity based on described epicycle positive example sample, chooses described epicycle and bear routine sample in the residue sample in whole sample except described positive example sample;
Module chosen by second sample, for choosing the first sample in described epicycle positive example sample, bearing in routine sample choose the second sample in described epicycle, and described first sample is consistent with the sample size comprised in described second sample;
Described model training module, for according to the residue sample in the residue sample in described positive example sample except described first sample, described negative routine sample except described second sample, carries out model training.
8. device according to claim 7, is characterized in that, described device also comprises:
Model evaluation module, for according to described first sample and described second sample, assesses described epicycle disaggregated model, obtains assessment result; Judge whether described epicycle disaggregated model meets described specified requirements according to described assessment result; When described assessment result is better than the classification performance index arranged, determine that described epicycle disaggregated model meets described specified requirements.
9. device according to claim 6, is characterized in that, described sample process module, for according to described classification results, determines prediction positive example sample in whole sample; For each sample in prediction positive example sample, determine that described sample is classified as the probability of positive example sample according to described classification results; In described prediction positive example sample, choose the preset number sample that the probability that is classified as positive example sample is the highest; A described preset number sample is defined as described specific sample.
10. device according to claim 6, it is characterized in that, described model training module, for based on treating training pattern, calculate the proper vector that described epicycle positive example sample and described epicycle bear routine sample, describedly treat that training pattern is the disaggregated model that last round of training process obtains, described in treat that the class categories of training pattern is determined according to the sample characteristics data of configuration; Bear the proper vector of each sample in routine sample according to the proper vector of each sample in described epicycle positive example sample and described epicycle, routine sample is born to epicycle positive example sample and epicycle and classifies; According to sample classification result and the mark result to described epicycle positive example sample, treat the parameters of training pattern described in optimization, obtain described epicycle disaggregated model.
CN201510456761.2A 2015-07-29 2015-07-29 Classification model training method and device Pending CN105069470A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510456761.2A CN105069470A (en) 2015-07-29 2015-07-29 Classification model training method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510456761.2A CN105069470A (en) 2015-07-29 2015-07-29 Classification model training method and device

Publications (1)

Publication Number Publication Date
CN105069470A true CN105069470A (en) 2015-11-18

Family

ID=54498831

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510456761.2A Pending CN105069470A (en) 2015-07-29 2015-07-29 Classification model training method and device

Country Status (1)

Country Link
CN (1) CN105069470A (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105812174A (en) * 2016-03-06 2016-07-27 刘健文 Network data determining model training method and apparatus
CN106897746A (en) * 2017-02-28 2017-06-27 北京京东尚科信息技术有限公司 Data classification model training method and device
CN106934413A (en) * 2015-12-31 2017-07-07 阿里巴巴集团控股有限公司 Model training method, apparatus and system and sample set optimization method, device
WO2017133569A1 (en) * 2016-02-05 2017-08-10 阿里巴巴集团控股有限公司 Evaluation index obtaining method and device
CN107517251A (en) * 2017-08-16 2017-12-26 北京小度信息科技有限公司 Information-pushing method and device
CN108011740A (en) * 2016-10-28 2018-05-08 腾讯科技(深圳)有限公司 A kind of media flow data processing method and device
CN108229517A (en) * 2017-01-24 2018-06-29 北京市商汤科技开发有限公司 Neural metwork training and high spectrum image decomposition method, device and electronic equipment
CN108269012A (en) * 2018-01-12 2018-07-10 中国平安人寿保险股份有限公司 Construction method, device, storage medium and the terminal of risk score model
CN108427690A (en) * 2017-02-15 2018-08-21 腾讯科技(深圳)有限公司 Information distribution method and device
CN109101507A (en) * 2017-06-20 2018-12-28 腾讯科技(深圳)有限公司 Data processing method, device, computer equipment and storage medium
CN109344904A (en) * 2018-10-16 2019-02-15 杭州睿琪软件有限公司 Generate method, system and the storage medium of training sample
CN109447125A (en) * 2018-09-28 2019-03-08 北京达佳互联信息技术有限公司 Processing method, device, electronic equipment and the storage medium of disaggregated model
CN109522304A (en) * 2018-11-23 2019-03-26 中国联合网络通信集团有限公司 Exception object recognition methods and device, storage medium
CN109685555A (en) * 2018-12-13 2019-04-26 拉扎斯网络科技(上海)有限公司 Merchant screening method and device, electronic equipment and storage medium
CN109739982A (en) * 2018-12-20 2019-05-10 中国科学院软件研究所 Event detecting method
CN109785850A (en) * 2019-01-18 2019-05-21 腾讯音乐娱乐科技(深圳)有限公司 A kind of noise detecting method, device and storage medium
CN109903166A (en) * 2018-12-25 2019-06-18 阿里巴巴集团控股有限公司 A kind of data Risk Forecast Method, device and equipment
CN111477219A (en) * 2020-05-08 2020-07-31 合肥讯飞数码科技有限公司 Keyword distinguishing method and device, electronic equipment and readable storage medium
CN112308099A (en) * 2019-07-29 2021-02-02 腾讯科技(深圳)有限公司 Sample feature importance determination method, and classification model training method and device
CN113240438A (en) * 2021-05-11 2021-08-10 京东数字科技控股股份有限公司 Intention recognition method, device, storage medium and program product
US11151182B2 (en) 2017-07-24 2021-10-19 Huawei Technologies Co., Ltd. Classification model training method and apparatus
CN117314909A (en) * 2023-11-29 2023-12-29 无棣源通电子科技有限公司 Circuit board defect detection method, device, equipment and medium based on artificial intelligence
CN117909494A (en) * 2024-03-20 2024-04-19 北京建筑大学 Abstract consistency assessment model training method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101071439A (en) * 2007-05-24 2007-11-14 北京交通大学 Interactive video searching method based on multi-view angle
CN102663264A (en) * 2012-04-28 2012-09-12 北京工商大学 Semi-supervised synergistic evaluation method for static parameter of health monitoring of bridge structure
CN103105924A (en) * 2011-11-15 2013-05-15 中国科学院深圳先进技术研究院 Man-machine interaction method and device
CN103150578A (en) * 2013-04-09 2013-06-12 山东师范大学 Training method of SVM (Support Vector Machine) classifier based on semi-supervised learning
CN104408475A (en) * 2014-12-08 2015-03-11 深圳市捷顺科技实业股份有限公司 Vehicle license plate identification method and vehicle license plate identification equipment
CN104537383A (en) * 2015-01-20 2015-04-22 全国组织机构代码管理中心 Massive organizational structure data classification method and system based on particle swarm

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101071439A (en) * 2007-05-24 2007-11-14 北京交通大学 Interactive video searching method based on multi-view angle
CN103105924A (en) * 2011-11-15 2013-05-15 中国科学院深圳先进技术研究院 Man-machine interaction method and device
CN102663264A (en) * 2012-04-28 2012-09-12 北京工商大学 Semi-supervised synergistic evaluation method for static parameter of health monitoring of bridge structure
CN103150578A (en) * 2013-04-09 2013-06-12 山东师范大学 Training method of SVM (Support Vector Machine) classifier based on semi-supervised learning
CN104408475A (en) * 2014-12-08 2015-03-11 深圳市捷顺科技实业股份有限公司 Vehicle license plate identification method and vehicle license plate identification equipment
CN104537383A (en) * 2015-01-20 2015-04-22 全国组织机构代码管理中心 Massive organizational structure data classification method and system based on particle swarm

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106934413A (en) * 2015-12-31 2017-07-07 阿里巴巴集团控股有限公司 Model training method, apparatus and system and sample set optimization method, device
CN106934413B (en) * 2015-12-31 2020-10-13 阿里巴巴集团控股有限公司 Model training method, device and system and sample set optimization method and device
WO2017133569A1 (en) * 2016-02-05 2017-08-10 阿里巴巴集团控股有限公司 Evaluation index obtaining method and device
CN105812174A (en) * 2016-03-06 2016-07-27 刘健文 Network data determining model training method and apparatus
CN108011740A (en) * 2016-10-28 2018-05-08 腾讯科技(深圳)有限公司 A kind of media flow data processing method and device
CN108011740B (en) * 2016-10-28 2021-04-30 腾讯科技(深圳)有限公司 Media flow data processing method and device
CN108229517B (en) * 2017-01-24 2020-08-04 北京市商汤科技开发有限公司 Neural network training and hyperspectral image interpretation method and device and electronic equipment
CN108229517A (en) * 2017-01-24 2018-06-29 北京市商汤科技开发有限公司 Neural metwork training and high spectrum image decomposition method, device and electronic equipment
CN108427690A (en) * 2017-02-15 2018-08-21 腾讯科技(深圳)有限公司 Information distribution method and device
CN106897746A (en) * 2017-02-28 2017-06-27 北京京东尚科信息技术有限公司 Data classification model training method and device
CN106897746B (en) * 2017-02-28 2020-03-03 北京京东尚科信息技术有限公司 Data classification model training method and device
CN109101507B (en) * 2017-06-20 2023-09-26 腾讯科技(深圳)有限公司 Data processing method, device, computer equipment and storage medium
CN109101507A (en) * 2017-06-20 2018-12-28 腾讯科技(深圳)有限公司 Data processing method, device, computer equipment and storage medium
US11151182B2 (en) 2017-07-24 2021-10-19 Huawei Technologies Co., Ltd. Classification model training method and apparatus
CN107517251B (en) * 2017-08-16 2020-12-15 北京星选科技有限公司 Information pushing method and device
CN107517251A (en) * 2017-08-16 2017-12-26 北京小度信息科技有限公司 Information-pushing method and device
CN108269012A (en) * 2018-01-12 2018-07-10 中国平安人寿保险股份有限公司 Construction method, device, storage medium and the terminal of risk score model
CN109447125A (en) * 2018-09-28 2019-03-08 北京达佳互联信息技术有限公司 Processing method, device, electronic equipment and the storage medium of disaggregated model
CN109344904A (en) * 2018-10-16 2019-02-15 杭州睿琪软件有限公司 Generate method, system and the storage medium of training sample
CN109522304B (en) * 2018-11-23 2021-05-18 中国联合网络通信集团有限公司 Abnormal object identification method and device and storage medium
CN109522304A (en) * 2018-11-23 2019-03-26 中国联合网络通信集团有限公司 Exception object recognition methods and device, storage medium
CN109685555A (en) * 2018-12-13 2019-04-26 拉扎斯网络科技(上海)有限公司 Merchant screening method and device, electronic equipment and storage medium
CN109739982A (en) * 2018-12-20 2019-05-10 中国科学院软件研究所 Event detecting method
CN109903166B (en) * 2018-12-25 2024-01-30 创新先进技术有限公司 Data risk prediction method, device and equipment
CN109903166A (en) * 2018-12-25 2019-06-18 阿里巴巴集团控股有限公司 A kind of data Risk Forecast Method, device and equipment
CN109785850A (en) * 2019-01-18 2019-05-21 腾讯音乐娱乐科技(深圳)有限公司 A kind of noise detecting method, device and storage medium
CN112308099A (en) * 2019-07-29 2021-02-02 腾讯科技(深圳)有限公司 Sample feature importance determination method, and classification model training method and device
CN111477219A (en) * 2020-05-08 2020-07-31 合肥讯飞数码科技有限公司 Keyword distinguishing method and device, electronic equipment and readable storage medium
CN113240438A (en) * 2021-05-11 2021-08-10 京东数字科技控股股份有限公司 Intention recognition method, device, storage medium and program product
CN117314909A (en) * 2023-11-29 2023-12-29 无棣源通电子科技有限公司 Circuit board defect detection method, device, equipment and medium based on artificial intelligence
CN117314909B (en) * 2023-11-29 2024-02-09 无棣源通电子科技有限公司 Circuit board defect detection method, device, equipment and medium based on artificial intelligence
CN117909494A (en) * 2024-03-20 2024-04-19 北京建筑大学 Abstract consistency assessment model training method and device
CN117909494B (en) * 2024-03-20 2024-06-07 北京建筑大学 Abstract consistency assessment model training method and device

Similar Documents

Publication Publication Date Title
CN105069470A (en) Classification model training method and device
CN106201871B (en) Based on the Software Defects Predict Methods that cost-sensitive is semi-supervised
CN107067025B (en) Text data automatic labeling method based on active learning
AU2018101946A4 (en) Geographical multivariate flow data spatio-temporal autocorrelation analysis method based on cellular automaton
CN107657267B (en) Product potential user mining method and device
CN103617435B (en) Image sorting method and system for active learning
CN105389480B (en) Multiclass imbalance genomics data iteration Ensemble feature selection method and system
CN110610193A (en) Method and device for processing labeled data
CN106651574A (en) Personal credit assessment method and apparatus
WO2024067387A1 (en) User portrait generation method based on characteristic variable scoring, device, vehicle, and storage medium
CN107016416B (en) Data classification prediction method based on neighborhood rough set and PCA fusion
CN104778186A (en) Method and system for hanging commodity object to standard product unit (SPU)
CN107545038A (en) A kind of file classification method and equipment
CN111984873A (en) Service recommendation system and method
CN116485020B (en) Supply chain risk identification early warning method, system and medium based on big data
CN114139634A (en) Multi-label feature selection method based on paired label weights
CN109615421B (en) Personalized commodity recommendation method based on multi-objective evolutionary algorithm
CN113837266B (en) Software defect prediction method based on feature extraction and Stacking ensemble learning
CN104615910A (en) Method for predicating helix interactive relationship of alpha transmembrane protein based on random forest
CN104572623B (en) A kind of efficient data analysis and summary method of online LDA models
CN113360392A (en) Cross-project software defect prediction method and device
CN108764296A (en) More sorting techniques of study combination are associated with multitask based on K-means
Shaji et al. Weather Prediction Using Machine Learning Algorithms
CN111221915B (en) Online learning resource quality analysis method based on CWK-means
CN107291722B (en) Descriptor classification method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20151118