CN107220217A - Characteristic coefficient training method and device that logic-based is returned - Google Patents
Characteristic coefficient training method and device that logic-based is returned Download PDFInfo
- Publication number
- CN107220217A CN107220217A CN201710398250.9A CN201710398250A CN107220217A CN 107220217 A CN107220217 A CN 107220217A CN 201710398250 A CN201710398250 A CN 201710398250A CN 107220217 A CN107220217 A CN 107220217A
- Authority
- CN
- China
- Prior art keywords
- data
- service feature
- regression models
- characteristic coefficient
- logic regression
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 84
- 238000012549 training Methods 0.000 title claims abstract description 50
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 72
- 238000005457 optimization Methods 0.000 claims abstract description 38
- 238000011156 evaluation Methods 0.000 claims description 26
- 238000005070 sampling Methods 0.000 claims description 20
- 238000007477 logistic regression Methods 0.000 claims description 14
- 230000008859 change Effects 0.000 claims description 10
- 238000002790 cross-validation Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 5
- 230000006870 function Effects 0.000 description 25
- 238000010586 diagram Methods 0.000 description 12
- 238000010801 machine learning Methods 0.000 description 9
- 238000012360 testing method Methods 0.000 description 9
- 230000006399 behavior Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 239000013598 vector Substances 0.000 description 5
- 241001269238 Data Species 0.000 description 4
- 230000006854 communication Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 4
- 238000012417 linear regression Methods 0.000 description 4
- 230000003542 behavioural effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000002354 daily effect Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000005291 magnetic effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
- 238000009423 ventilation Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Algebra (AREA)
- Operations Research (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Software Systems (AREA)
- Evolutionary Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the present invention provides characteristic coefficient training method and the device that a kind of logic-based is returned, and with reference to Logic Regression Models and iteration optimization algorithms, is more suitable for the big data background of electric business recommendation, quickly and accurately can calculate characteristic coefficient for each service feature.The characteristic coefficient training method that a kind of logic-based of the embodiment of the present invention is returned, including:Label data and service feature data are obtained, the service feature data are normalized;Logic Regression Models are determined, by the service feature data input after the label data and the normalized to the Logic Regression Models;The characteristic coefficient of the Logic Regression Models is determined by iteration optimization algorithms.
Description
Technical field
The present invention relates to the characteristic coefficient training method and dress of computer realm, more particularly to a kind of recurrence of logic-based
Put.
Background technology
With the development in epoch, shopping online has changed into a very important part in people's daily life.User exists
During net purchase, various structurings and non-structured behavioral data can be left in electric business website, such as order behavior, browse row
For, concern the behavioral data such as behavior and click behavior, these behavioral datas can serve as the service feature of training pattern, with structure
The algorithm of machine learning is built, so as to predict the personalized commercial preference of user.The accurate personalized recommendation in " face of thousand people thousand ", always
It is the target that the personalized recommendation system of major electric business is pursued, it is desirable to which the personalized recommendation effect obtained is calculating of user
Property preference-score when, it is necessary to distribute rational characteristic coefficient to each service feature.So, a kind of accurately and effectively feature system
Number automation more new algorithm, for the personalized recommendation of commodity, lifts Consumer's Experience, intelligent platform service, all with great
Meaning.
Each service feature to training pattern determines that characteristic coefficient mainly has two ways in the prior art:One is
ABtest (A/B tests) regulation coefficient method, this method is that Data Analyst adjusts feature according to business experience using ABtest
Coefficient;Two be random sampling by statistical software design factor method, then this method is borrowed by randomly selecting low volume data sample
Help statistic software SPSS (Statistical Product and Service Solutions, statistical product and service solution party
Case) or R calculate the characteristic coefficient of service feature.
In process of the present invention is realized, inventor has found that at least there are the following problems in the prior art:
(1) ABtest regulation coefficients method relies on larger to the business experience of Data Analyst, when needing to take over a new industry
When business, this method is possible to that very multi-group data can be tested, and can just provide the rational characteristic coefficient of comparison, waste of resource.
(2) random sampling is the random sampling on large data sets by statistical software design factor method, because can enter system
The number of data that meter software SPSS or R do logistic regression offline is conditional, and this method is to estimate all with sample, can band
There is random error.Secondly, this method Data Analyst often does a logistic regression and just secures characteristic coefficient, characteristic coefficient it is dynamic
State updates not in time, causes personalized ventilation system accurate in time.
The content of the invention
In view of this, the embodiment of the present invention provides characteristic coefficient training method and the device that a kind of logic-based is returned, knot
Logical regression model and iteration optimization algorithms, are more suitable for the big data background of electric business recommendation, can be quickly and accurately each
Service feature calculates characteristic coefficient.
To achieve the above object, there is provided the spy that a kind of logic-based is returned for one side according to embodiments of the present invention
Levy coefficient training method.
The characteristic coefficient training method that a kind of logic-based of the embodiment of the present invention is returned, including:Obtain label data and
The service feature data are normalized by service feature data;Logic Regression Models are determined, by the label data
With the service feature data input after the normalized to the Logic Regression Models;Institute is determined by iteration optimization algorithms
State the characteristic coefficient of Logic Regression Models.
Alternatively, the iteration optimization algorithms are L-BFGS algorithms or Stochastic gradient method.
Alternatively, the service feature data are normalized including:Reject the mistake of the service feature data
Point data, is normalized to rejecting the service feature data after overdue data.
Alternatively, the service feature data after the label data and the normalized are inputted according to preset format
To the Logic Regression Models, the form includes:Label is corresponding with least one set service feature ID and service feature ID
Characteristic value, the characteristic value is the service feature data after normalized.
Alternatively, methods described also includes:Regularization constraint is carried out to the Logic Regression Models.
Alternatively, methods described also includes:The evaluation index of the Logic Regression Models is set, passes through the side of cross validation
Formula selects one group of characteristic coefficient as the final characteristic coefficient of the Logic Regression Models.
Alternatively, one group of characteristic coefficient is selected by way of cross validation as the optimal spy of the Logic Regression Models
Levying coefficient includes:Service feature data after the label data and the normalized be grouped to obtain multigroup checking
Subset data, is separately input to the Logic Regression Models by every group of checking subset data and obtains multigroup characteristic coefficient and prediction
Scoring probability, the evaluation index of every group of checking subset data is calculated according to prediction scoring probability and label data, all points are asked for
The average value of group evaluation index selects final feature system to obtain the assessment performance of the Logic Regression Models according to assessment performance
Number;The evaluation index includes accuracy rate, recall ratio, precision ratio, F values and AUC.
Alternatively, place is normalized in the service feature data by the acquisition label data and service feature data
Also include after reason step:Lack sampling, mistake are passed through to the service feature data after the label data and the normalized
The mode of sampling or threshold value movement removes noise data.
Alternatively, methods described also includes:The Logic Regression Models after characteristic coefficient will be determined by iteration optimization algorithms
Offline task is subscribed into automate renewal characteristic coefficient.
To achieve the above object, there is provided the spy that a kind of logic-based is returned for another aspect according to embodiments of the present invention
Levy coefficient trainer.
The characteristic coefficient trainer that a kind of logic-based of the embodiment of the present invention is returned, including:Data preprocessing module,
For obtaining label data and service feature data, the service feature data are normalized;Logic Regression Models
Module, for determining Logic Regression Models, by the service feature data input after the label data and the normalized
To the Logic Regression Models;Characteristic coefficient determining module, for determining the Logic Regression Models by iteration optimization algorithms
Characteristic coefficient.
The iteration optimization algorithms are L-BFGS algorithms or Stochastic gradient method.
Alternatively, the data preprocessing module is additionally operable to:The overdue data of the service feature data are rejected, to rejecting
Service feature data after overdue data are normalized.
Alternatively, the Logic Regression Models module is additionally operable to:After the label data and the normalized
Service feature data are input to the Logic Regression Models according to preset format, and the form includes:Label and at least one set of industry
Characteristic ID of being engaged in and the corresponding characteristic values of service feature ID, the characteristic value are the service feature data after normalized.
Alternatively, described device also includes:Regularization constraint module, for carrying out regularization to the Logic Regression Models
Constraint.
Alternatively, described device also includes:Characteristic coefficient selecting module, the assessment for setting the Logic Regression Models
Index, selects one group of characteristic coefficient as the final characteristic coefficient of the Logic Regression Models by way of cross validation.
Alternatively, the characteristic coefficient selecting module is additionally operable to:After the label data and the normalized
Service feature data be grouped obtaining multigroup checking subset data, and every group of checking subset data is separately input into the logic
Regression model obtains multigroup characteristic coefficient and prediction scoring probability, and calculating every group according to prediction scoring probability and label data tests
The evaluation index of subset data is demonstrate,proved, asks for the average value of all packet evaluation indexes to obtain the assessment of the Logic Regression Models
Performance, final characteristic coefficient is selected according to assessment performance;The evaluation index include accuracy rate, recall ratio, precision ratio, F values and
AUC。
Alternatively, described device also includes:Noise data removes module, for the label data and the normalization
Service feature data after processing remove noise data by way of the movement of lack sampling, over-sampling or threshold value.
Alternatively, described device also includes:Coefficient of automation update module, for spy will to be determined by iteration optimization algorithms
The Logic Regression Models after coefficient are levied to subscribe into offline task to automate renewal characteristic coefficient.
To achieve the above object, there is provided a kind of electronic equipment for another further aspect according to embodiments of the present invention.
The a kind of electronic equipment of the embodiment of the present invention, including:One or more processors;Storage device, for storing one
Individual or multiple programs, when one or more of programs are by one or more of computing devices so that one or many
Individual processor realizes the characteristic coefficient training method that a kind of logic-based of the embodiment of the present invention is returned.
To achieve the above object, there is provided a kind of computer-readable medium for another further aspect according to embodiments of the present invention.
A kind of computer-readable medium of the embodiment of the present invention, is stored thereon with computer program, and described program is processed
Device realizes the characteristic coefficient training method that a kind of logic-based of the embodiment of the present invention is returned when performing.
One embodiment in technique according to the invention scheme, foregoing invention has the following advantages that or beneficial effect:This
Invention combines Logic Regression Models and iteration optimization algorithms on large data sets, can utilize more valid data, accurate fast
Fast calculates characteristic coefficient for each service feature;By calling L-BFGS algorithms to run calling program in Spark platforms
On large data sets, meet the trend of current era big data machine learning, Optimized Iterative algorithm is more healthy and stronger reliable, feature system
Several Optimized Iterative convergence rates are faster;The automatic of characteristic coefficient is realized by the way that Logic Regression Models are subscribed into offline task
Change and update, both liberated the part work of Data Analyst, and economized on resources, the individual character of user can be timely and accurately found again
Change preference, automatically update corresponding characteristic coefficient according to the newest Shopping Behaviors of user, improve Consumer's Experience, to electric business platform
More intelligent personalized recommendation system is built to be significant.
The further effect that above-mentioned non-usual optional mode has adds hereinafter in conjunction with embodiment
With explanation.
Brief description of the drawings
Accompanying drawing is used to more fully understand the present invention, does not constitute inappropriate limitation of the present invention.Wherein:
Fig. 1 is the schematic diagram of the key step of characteristic coefficient training method according to embodiments of the present invention;
Fig. 2 is the execution schematic flow sheet of characteristic coefficient training method according to embodiments of the present invention;
Fig. 3 is that the characteristic coefficient training method of the embodiment of the present invention is input to the schematic diagram data of Logic Regression Models;
Fig. 4 is the characteristic coefficient training result schematic diagram of the characteristic coefficient training method of the embodiment of the present invention;
Fig. 5 is the composition schematic diagram of characteristic coefficient trainer according to embodiments of the present invention;
Fig. 6 applies to the structural representation of the computer system of the electronic equipment to realize the embodiment of the present invention.
Embodiment
The one exemplary embodiment of the present invention is explained below in conjunction with accompanying drawing, including the various of the embodiment of the present invention
Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize
Arrive, various changes and modifications can be made to the embodiments described herein, without departing from scope and spirit of the present invention.Together
Sample, for clarity and conciseness, eliminates the description to known function and structure in following description.
In the embodiment of the present invention, the characteristic coefficient training method before improving, realize a kind of logic-based return and
The solution of the characteristic coefficient training of iteration optimization algorithms.Prediction user is needed in embodiments of the invention to three-level category
Preference-score, realization approach is:From Hive tables, (Hive is a Tool for Data Warehouse based on Hadoop, can be by structuring
Data file be mapped as a database table) in obtain order record, concern record, browse record, click on record label
Data and service feature data, are divided into three parts by whole label data and service feature data, such as are divided into 8:1:1,
Wherein 8 parts of data are used for doing training set, and 1 part of data are used for making checking collection, and 1 part of data are used for doing test set, are first
Trained with the data of training set in Logic Regression Models, training process by the way that in Spark platforms, (Spark platforms are a kind of collection
Group computing environment) MLlib storehouses (Machine Learning Library, machine learning algorithm storehouse) in call iteration optimization to calculate
Method, obtains local optimum parameter;Then the data input collected will be verified to the Logic Regression Models trained, according to evaluation index
Select optimal characteristics coefficient;Finally the Logic Regression Models come will be trained in the data input of test set to training set data,
Obtain probability preference-score value of the user to three-level category.Wherein, incoming traffic characteristic, test set are only needed in test set
In label data predict come.The existing label data for predicting and is concentrated in checking, also there is real label data.
Fig. 1 is the schematic diagram of the key step of the characteristic coefficient training method of the embodiment of the present invention.
As shown in figure 1, the characteristic coefficient training method of the embodiment of the present invention mainly comprises the following steps:
Step S11:Label data and service feature data are obtained, the service feature data are normalized.
Label data and service feature data are obtained from Hive tables, in order that characteristic under same dimension, it is necessary to acquisition
Service feature data are normalized.The overdue data in Hive tables can be first rejected when data are normalized,
Such as order is more than 100,000,000 user.Wherein, label is the target of Logic Regression Models training, and label configurations mode is for difference
Business be different, such as carrying out user in the prediction of the preference of three-level category, label can be beaten as 1 and 0, label is
1 represents user to this three-level category preference, and label is 0 expression user to this three-level category not preference or other
Construct the mode of label.
After step S11 completes data prediction, design and the feature system of Logic Regression Models are proceeded by from step S12
Several determinations.
Step S12:Logic Regression Models are determined, by the service feature number after the label data and the normalized
According to being input to the Logic Regression Models.Logistic regression is a kind of generalized linear regression, is added on the basis of linear regression
Sigmoid functions carry out Nonlinear Mapping, and successive value can be mapped on 0 and 1 by this function.The dependent variable of logistic regression
Can be two classification can also be it is polytypic, conventional is exactly two logistic regressions classified in practice.Determine logistic regression
The disaggregated model of machine learning is defined as Logic Regression Models by model.Business after label data and the normalized
The data format that characteristic need to be supported according to iteration optimization algorithms is input in Logic Regression Models.
Step S13:The characteristic coefficient of the Logic Regression Models is determined by iteration optimization algorithms.Iteration optimization algorithms can
To be that L-BFGS algorithms can also be SGD algorithms (Stochastic gradient method).L-BFGS algorithms are that a kind of solution is non-linear excellent without constraining
The conventional method of change problem, the algorithm has more perfect local convergence theoretical, with its inventor Broyden, Fletcher,
Goldfarb and Shanno initial name, L is Limit abbreviation, is that the one kind of BFGS algorithms in limited internal memory is approximate
Algorithm, it is advantageous on large data sets.L-BFGS algorithms save resource also than Stochastic gradient method fast convergence rate, in spy
Levy preferential from L-BFGS algorithms in coefficient training process, Stochastic gradient method can also be used when data volume is few.
Fig. 2 is the execution schematic flow sheet of the characteristic coefficient training method of the embodiment of the present invention.
As shown in Fig. 2 in the embodiment of the present invention, the specific execution flow of characteristic coefficient training method is:
Order record, concern record, the label data and service feature for browsing record, clicking on record are obtained from Hive tables
Data, reject order record, concern record, record, the overdue data clicked in record are browsed, to rejecting the industry after overdue data
Business characteristic does normalized.The step needs to get out the industry for the order that the label data of user, user go over 1 year
Business characteristic, user go over the service feature browsed that the service feature data of trimestral concern, user are gone over one month
Data, user go over the service feature data of the click in a week, in order to allow characteristic under same dimension, it is necessary to right
Characteristic does normalized.
Logic Regression Models are determined, the service feature data input after the label data and the normalized is arrived
The Logic Regression Models.The loss function of logistic regression uses log-likelihood loss function, and the compression function of outer layer is used
Sigmoid functions, Sigmoid functions can by the output area of linear regression from it is negative it is infinite to just it is infinite be compressed to 0 and 1 it
Between, a wide range of magnitude compression to the influence of the variable especially stood out in the range of this, can be eliminated, that is, eliminate the exception of data
Value, such as king-sized point and especially small point.
The characteristic coefficient of the Logic Regression Models is determined by iteration optimization algorithms.L-BFGS algorithms are applied to distribution
Sequence vector s in the calculating of big data, its storage computation processj, yj, according to sjAnd yjCalculate the inverse of Hessian matrix
Approximately, while only storing m nearest sj, yj, wherein sequence vector sjCorresponding to the service feature data of the present invention, yjCorrespondence
In the label data of the present invention, j is meant that j-th of sample, therefore L-BFGS algorithms save resource, and because of its convergence rate ratio
Stochastic gradient method is fast, preferential in characteristic coefficient training method to select L-BFGS quasi-Newton methods.The implementation process of the step is:
L-BFGS algorithms are called in the MLlib storehouses of Spark platforms, each parameter of Logic Regression Models is obtained by repeatedly study iteration.
Here the parameter obtained includes characteristic coefficient, but this feature coefficient is to determine that the local optimum of a Logic Regression Models is special
Levy coefficient.Total optimization characteristic coefficient needs to coordinate evaluation index and cross validation below to obtain.
Wherein, normalization processing method is as follows:
In formula, score refers to the fraction after service feature data normalization, and span is 0 to 1;I is positive integer, table
Show the dimension for obtaining data;xiRepresent, when i values are different, to refer to the service feature data of service order, business respectively and pay close attention to
Service feature data and click on the service feature data recorded that service feature data, business are browsed.
In addition, if the service feature data and label data sample imbalance that obtain, that is, the label obtained are 1 sample
Less, label is more for 0 sample, if directly allowing such service feature data to be trained into Logic Regression Models,
Characteristic coefficient trainer can be predicted as more new datas on label 0, but we wish to predict more 1 again, with to use
Do abundant personalized recommendation in family.For the sample imbalance problem, it can be moved by lack sampling, over-sampling and threshold value
Mode is solved, and the mode of lack sampling, over-sampling and threshold value movement is mode of the prior art.The present invention is with lack sampling
Exemplified by mode, it is described in detail:Remove the data that some labels in training set are 0 so that label is 0 and label is 1
Data number is approached, and the present invention uses stratified randon sampling for 0 and ratio that label is 1 data, that is, to be utilized come abstract factory
Label is divided into several different set by integrated study mechanism for 0 data, is learnt for different learners, so to every
Lack sampling is all employed from the point of view of individual learner, but important information will not be lost from the point of view of the overall situation.Such as:Data can be divided into two
Part:The a few sample that many numerical examples and label that label is 0 are 1 respectively, for most labels for 0 sample by n times
Put back to, generate n one's share of expenses for a joint undertaking collection, a small number of labels are merged into training a new model with this n parts of sample respectively for 1 sample, so
N Logic Regression Models can be obtained, final model is the average value that this n Logic Regression Models predicts the outcome, this think of
Think similar decision tree generation random forest.
Wherein, the service feature data needs after the label data of the Logic Regression Models and normalized are input to
Inputted according to predetermined format, the form includes:Label spy corresponding with least one set service feature ID and service feature ID
Value indicative, the characteristic value is the service feature data after normalized.Embodiments of the invention provide one kind of the form
Expression way:
Tagged traffic characteristic ID:Characteristic value service feature ID:Characteristic value ...
In the form, service feature ID:The number of characteristic value is not limited, and is equal to the data dimension number of acquisition, this hair
It is 4 in bright embodiment;Characteristic value is equal to the service feature data after normalization.The form is algorithmic tool branch in the present invention
The form held, extends to extended formatting.
Fig. 3 is that the characteristic coefficient training method of the embodiment of the present invention is input to the schematic diagram data of Logic Regression Models.
As shown in figure 3, embodiments of the invention obtain 4 dimension datas from Hive tables.First is classified as label, because
Embodiment is prediction user preference or non-preference, is two classification problems, so label only has two kinds of situations 0 and 1;Per a line
Service feature ID is 1,2,3,4, and order dimension is represented respectively, dimension is paid close attention to, browse dimension and clicks on dimension;Characteristic value is to return
Service feature data score after one change.
In addition, the expression formula of Sigmoid functions is:
In formula, x represents feature value vector, is order feature here, browses feature, pays close attention to feature, click feature this four
The characteristic value of dimension;θ is the parameter vector of each service feature, corresponds to order feature respectively here, browses feature, concern spy
Levy, the characteristic coefficient of click feature;θTX represents two multiplication of vectors, and T represents transposition;G () represents nonlinear mapping function.
In addition, the expression formula of log-likelihood loss function is:
In formula, y represents label, and what y was that 1 expression label beats is that 1, y is that 0 represent that label beats is 0.
Wherein, the process for determining characteristic coefficient by L-BFGS algorithms is:Step1:Preset special during first time iteration
The initial value (numCorrection=10) for levying coefficient, the threshold value (convergencetol=1e-4) for stopping iteration, iteration time
Number (maxNumIterations=20), initial search direction (negative gradient direction) and the initial step length factor (regParam=
0.1);Step2:Parameter value is substituted into characteristic coefficient when next iteration is obtained in object function, calculated in iterative process
The direction of search and iteration step factor;Step3:Judge whether the characteristic coefficient difference of iteration twice meets optimization convergence bar
Whether part (is less than the threshold value for stopping iteration), if being unsatisfactory for the optimization condition of convergence continues to iteration, until what is calculated twice
Characteristic coefficient difference meets the condition of convergence.If reaching, iterations does not meet the optimization condition of convergence also, stops iteration.Wherein, damage
Lose function and take minimum value as object function.
In addition, the step of method of the embodiment of the present invention also includes carrying out regularization constraint to the Logic Regression Models.
In order to prevent the over-fitting of Logic Regression Models, increase the robustness and model generalization ability of Logic Regression Models, it is of the invention
Embodiment L2 regularization factors are added in object function come weigh Logic Regression Models to sample predictions and true tag it
Between error.Optimization objective function=minimum training error+minimum test error, due to Logic Regression Models needs pair
Service feature data and label data in training set are fitted, and to be fitted training data as far as possible, but we not only will
Ensure that training error is minimum, it is also desirable to which the test error of Logic Regression Models is small, is that constraint logic regression model is tried one's best simply, makes
Test error it is small, so be accomplished by add regularization thought.Regularization constraint algorithm is including L1 regularization algorithms and L2 just
Then change algorithm, L1 regularization algorithms obey laplacian distribution, L2 regularization algorithm Gaussian distributeds, and user can be according to reality
Selection one of which is needed to carry out regularization constraint.
In addition, the method for the embodiment of the present invention also includes the evaluation index for setting the Logic Regression Models, by intersecting
The mode of checking selects one group of characteristic coefficient as the final characteristic coefficient of the Logic Regression Models.The assessment that the present invention is used
Index is:(Area under roc curve are the one of measurement model quality by accuracy rate, recall ratio, precision ratio, F values and AUC
Individual standard).AUC is bigger, and the training effect of Logic Regression Models is better, and commodity sequence is general to select AUC indexs;Classification problem one
As select F values, according to different business evaluating indexs, accuracy rate, recall ratio, precision ratio can also be selected.In logistic regression
In two classification problems, four kinds of situations occur in predicted value and actual value:In training set, if a sample label be 1 also by
1 is predicted into, TP (True Positive) is remembered into;If sample label is 1 to be predicted to be 0, FP (False are remembered into
Positive), if a sample label is 0 to be also predicted to 0, TN (True Negative) is remembered into;If sample label is
0 is predicted to be 1, remembers into FN (False Negative), then above-mentioned several evaluation indexes are respectively:
Wherein, one group of characteristic coefficient is selected by way of cross validation as the optimal characteristics of the Logic Regression Models
Coefficient includes:By the service feature data correlation after label data and normalized in a table, in the table after association both
There is label data there are service feature data again, data be divided into K groups, do training set from different subset datas and checking collects,
The logistic regression mould after K L-GFBS algorithm optimization will can be thus obtained per each subset data respectively as a checking collection
Type, with the average value of the evaluation index of the checking collection of the Logic Regression Models after this K L-GFBS algorithm optimization finally as patrolling
Collect the Performance Evaluation index of regression model.Such as packet count K is 10, and evaluation index is above-mentioned 5, then obtains 10 checking subsets
Data, each checking subset data serves as one-time authentication collection, will do 10 cross validations and obtain 10 groups of characteristic coefficients, finally
Best in 10 groups of characteristic coefficients one group is selected as the final feature of Logic Regression Models according to the average value of assessment performance
Coefficient;Average value is 10 average values of each evaluation index, finally still obtains 5 evaluation indexes, and this 5 evaluation indexes are more
It is representative.
Fig. 4 is the characteristic coefficient training result schematic diagram of the characteristic coefficient training method of the embodiment of the present invention.
As shown in figure 4, in the prediction of user's three-level category preference-score, the four business spies used in the embodiment of the present invention
The characteristic coefficient training result levied and (browse feature, concern feature, order feature and click feature) is as schemed, and four numerals are in figure
Correspond to the characteristic coefficient of four service feature distribution.
In addition, the method for the embodiment of the present invention also includes:The logic after characteristic coefficient will be determined by iteration optimization algorithms
Regression model subscribes into offline task to automate renewal characteristic coefficient.The Spark programmed logics finished writing are placed on big data to put down
On platform, according to business demand can start by set date task daily, a subtask is performed daily, such characteristic coefficient is all new everyday
, the automation training of Logic Regression Models characteristic coefficient is realized, with flexibility and operability.
The present invention can be seen that by the characteristic coefficient training method of the embodiment of the present invention logic is combined on large data sets
Regression model and iteration optimization algorithms, can utilize more valid data, quickly and accurately be calculated for each service feature
Characteristic coefficient;By calling L-BFGS algorithms calling program is operated on large data sets in Spark platforms, meet current era
The trend of big data machine learning, Optimized Iterative algorithm is more healthy and stronger reliable, and the Optimized Iterative convergence rate of characteristic coefficient is faster;
The automation renewal of characteristic coefficient is realized by the way that Logic Regression Models are subscribed into offline task, Data Analyst has both been liberated
A part work, economize on resources, the personalization preferences of user can be timely and accurately found again, according to the newest shopping of user
Behavior automatically updates corresponding characteristic coefficient, improves Consumer's Experience, the personalized recommendation system more intelligent to electric business platform construction
System is significant.
Fig. 5 is that the characteristic coefficient of the embodiment of the present invention trains the composition schematic diagram of trainer.
As shown in figure 5, the trainer 50 of the embodiment of the present invention mainly includes:
The service feature data, for obtaining label data and service feature data, are entered by data preprocessing module 501
Row normalized.Label data and service feature data are obtained from Hive tables, in order that characteristic is under same dimension,
Need that the service feature data of acquisition are normalized.Can first it be rejected in Hive tables when data are normalized
Overdue data, such as order be more than 100,000,000 user.Wherein, label is the target of Logic Regression Models training, label configurations side
Formula is different for different business.Such as in preference prediction of the user to three-level category is carried out, label can be beaten as 1
With 0, label be 1 expression user to this three-level category preference, label is 0 expression user to this three-level category not preference,
It can be the mode of other construction labels.
Logic Regression Models module 502, for determining Logic Regression Models, at the label data and the normalization
Service feature data input after reason is to the Logic Regression Models.Logistic regression is a kind of generalized linear regression, is linear
Sigmoid functions are added on the basis of recurrence and carry out Nonlinear Mapping, successive value can be mapped on 0 and 1 by this function.
The dependent variable of logistic regression can be two classification can also be it is polytypic, in practice it is conventional be exactly two classification logics time
Return.Determine that the disaggregated model of machine learning is defined as Logic Regression Models by Logic Regression Models.Label data and described return
The data format that service feature data after one change processing need to be supported according to iteration optimization algorithms is input in Logic Regression Models.
Characteristic coefficient determining module 503, the feature system for determining the Logic Regression Models by iteration optimization algorithms
Number.Iteration optimization algorithms can be that L-BFGS algorithms can also be SGD algorithms (Stochastic gradient method).L-BFGS algorithms are that one kind is asked
The method that solution is commonly used without constrained nonlinear systems problem, the algorithm has more perfect local convergence theoretical, with its inventor
Broyden, Fletcher, Goldfarb and Shanno initial name, L is Limit abbreviation, is that BFGS algorithms are being limited
A kind of approximate data during internal memory, it is advantageous on large data sets.L-BFGS algorithms also than Stochastic gradient method fast convergence rate, and
And resource is saved, and it is preferential in characteristic coefficient training process to select L-BFGS algorithms, can also be using random when data volume is few
Gradient method.
In addition, the characteristic coefficient trainer of the embodiment of the present invention can also include regularization constraint module, characteristic coefficient
Selecting module, noise data remove module and coefficient of automation update module.Regularization constraint module is used to return the logic
Return model to carry out regularization constraint, can select L1 regularization algorithms or L2 regularization algorithms.Characteristic coefficient selecting module, is used for
The evaluation index of the Logic Regression Models is set, one group of characteristic coefficient is selected by way of cross validation as the logic
The final characteristic coefficient of regression model.Noise data removes module, after to the label data and the normalized
Service feature data lack sampling, over-sampling or threshold value movement by way of remove noise data, make the instruction of characteristic coefficient
Practice result more accurate.Coefficient of automation update module, for the logic after characteristic coefficient will to be determined by iteration optimization algorithms
Regression model subscribes into offline task to automate renewal characteristic coefficient.
From the above, it can be seen that the present invention combines Logic Regression Models and iteration optimization algorithms on large data sets,
More valid data can be utilized, quickly and accurately characteristic coefficient are calculated for each service feature;By in Spark platforms
In call L-BFGS algorithms calling program is operated on large data sets, meet the trend of current era big data machine learning, it is excellent
Change iterative algorithm more healthy and stronger reliable, the Optimized Iterative convergence rate of characteristic coefficient is faster;By the way that Logic Regression Models are subscribed to
The automation for realizing characteristic coefficient into offline task updates, and had both liberated the part work of Data Analyst, and had economized on resources,
The personalization preferences of user can be timely and accurately found again, and corresponding feature is automatically updated according to the newest Shopping Behaviors of user
Coefficient, improves Consumer's Experience, and the personalized recommendation system more intelligent to electric business platform construction is significant.
Embodiments in accordance with the present invention, present invention also offers a kind of electronic equipment and a kind of computer-readable medium.
The electronic equipment of the present invention includes:One or more processors;Storage device, for storing one or more journeys
Sequence, when one or more of programs are by one or more of computing devices so that one or more of processors are real
A kind of characteristic coefficient training method of the existing embodiment of the present invention.
The computer-readable medium of the present invention, is stored thereon with computer program, real when described program is executed by processor
A kind of characteristic coefficient training method of the existing embodiment of the present invention.
Below with reference to Fig. 6, it illustrates suitable for come the computer system 600 of the electronic equipment of realizing the embodiment of the present invention
Structural representation.Electronic equipment shown in Fig. 6 is only an example, to the function of the embodiment of the present invention and should not use model
Shroud carrys out any limitation.
As shown in fig. 6, computer system 600 includes CPU (CPU) 601, it can be read-only according to being stored in
Program in memory (ROM) 602 or be loaded into program in random access storage device (RAM) 603 from storage part 608 and
Perform various appropriate actions and processing.In RAM 603, the computer system that is also stored with 600 operates required various programs
And data.CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605
It is connected to bus 604.
I/O interfaces 605 are connected to lower component:Importation 606 including keyboard, mouse etc.;Penetrated including such as negative electrode
The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage part 608 including hard disk etc.;
And the communications portion 609 of the NIC including LAN card, modem etc..Communications portion 609 via such as because
The network of spy's net performs communication process.Driver 610 is also according to needing to be connected to I/O interfaces 605.Detachable media 611, such as
Disk, CD, magneto-optic disk, semiconductor memory etc., are arranged on driver 610, in order to read from it as needed
Computer program be mounted into as needed storage part 608.
Especially, according to embodiment disclosed by the invention, the process of key step figure description above may be implemented as meter
Calculation machine software program.For example, embodiment of the disclosure includes a kind of computer program product, it includes being carried on computer-readable
Computer program on medium, the computer program includes the program code for being used for performing the method shown in key step figure.
In such embodiment, the computer program can be downloaded and installed by communications portion 609 from network, and/or from can
Medium 611 is dismantled to be mounted.When the computer program is performed by CPU (CPU) 601, the system for performing the present invention
The above-mentioned functions of middle restriction.
It should be noted that the computer-readable medium shown in the present invention can be computer-readable signal media or meter
Calculation machine readable storage medium storing program for executing either the two any combination.Computer-readable recording medium for example can be --- but not
Be limited to --- electricity, magnetic, optical, electromagnetic, system, device or the device of infrared ray or semiconductor, or it is any more than combination.Meter
The more specifically example of calculation machine readable storage medium storing program for executing can include but is not limited to:Electrical connection with one or more wires, just
Take formula computer disk, hard disk, random access storage device (RAM), read-only storage (ROM), erasable type and may be programmed read-only storage
Device (EPROM or flash memory), optical fiber, portable compact disc read-only storage (CD-ROM), light storage device, magnetic memory device,
Or above-mentioned any appropriate combination.In the present invention, computer-readable recording medium can any include or store journey
The tangible medium of sequence, the program can be commanded execution system, device or device and use or in connection.And at this
In invention, computer-readable signal media can be included in a base band or as the data-signal of carrier wave part propagation,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but not limit
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium beyond storage medium is read, the computer-readable medium, which can send, propagates or transmit, to be used for
Used by instruction execution system, device or device or program in connection.Included on computer-readable medium
Program code can be transmitted with any appropriate medium, be included but is not limited to:Wirelessly, electric wire, optical cable, RF etc., or above-mentioned
Any appropriate combination.
Flow chart and block diagram in accompanying drawing, it is illustrated that according to the system of various embodiments of the invention, method and computer journey
Architectural framework in the cards, function and the operation of sequence product.At this point, each square frame in flow chart or block diagram can generation
The part of one module of table, program segment or code, a part for above-mentioned module, program segment or code is comprising one or more
Executable instruction for realizing defined logic function.It should also be noted that in some realizations as replacement, institute in square frame
The function of mark can also be with different from the order marked in accompanying drawing generation.For example, two square frames succeedingly represented are actual
On can perform substantially in parallel, they can also be performed in the opposite order sometimes, and this is depending on involved function.Also
It is noted that the combination of each square frame in block diagram or flow chart and the square frame in block diagram or flow chart, can use and perform rule
Fixed function or the special hardware based system of operation realize, or can use the group of specialized hardware and computer instruction
Close to realize.
Being described in module involved in the embodiment of the present invention can be realized by way of software, can also be by hard
The mode of part is realized.Described module can also be set within a processor, for example, can be described as:A kind of processor bag
Include data preprocessing module, Logic Regression Models module and characteristic coefficient determining module.Wherein, the title of these units is at certain
In the case of do not constitute restriction to the unit in itself, for example, data preprocessing module is also described as " obtaining number of tags
According to service feature data, the module that the service feature data are normalized ".
As on the other hand, present invention also offers a kind of computer-readable medium, the computer-readable medium can be
Included in equipment described in above-described embodiment;Can also be individualism, and without be incorporated the equipment in.Above-mentioned calculating
Machine computer-readable recording medium carries one or more program, when said one or multiple programs are performed by the equipment, makes
Obtaining the equipment includes:Label data and service feature data are obtained, the service feature data are normalized;It is determined that
Logic Regression Models, by the service feature data input after the label data and the normalized to the logistic regression
Model;The characteristic coefficient of the Logic Regression Models is determined by iteration optimization algorithms.
Technique according to the invention scheme, the present invention combines Logic Regression Models on large data sets and iteration optimization is calculated
Method, can utilize more valid data, quickly and accurately calculate characteristic coefficient for each service feature;By in Spark
Call L-BFGS algorithms calling program is operated on large data sets in platform, meet becoming for current era big data machine learning
Gesture, Optimized Iterative algorithm is more healthy and stronger reliable, and the Optimized Iterative convergence rate of characteristic coefficient is faster;By by Logic Regression Models
Subscribing into offline task realizes the automation renewal of characteristic coefficient, had both liberated the part work of Data Analyst, and had saved
Resource, can timely and accurately find the personalization preferences of user again, and correspondence is automatically updated according to the newest Shopping Behaviors of user
Characteristic coefficient, improve Consumer's Experience, the personalized recommendation system more intelligent to electric business platform construction is significant.
The said goods can perform the method that the embodiment of the present invention is provided, and possesses the corresponding functional module of execution method and has
Beneficial effect.Not ins and outs of detailed description in the present embodiment, reference can be made to the method that the embodiment of the present invention is provided.
Above-mentioned embodiment, does not constitute limiting the scope of the invention.Those skilled in the art should be bright
It is white, depending on design requirement and other factors, can occur various modifications, combination, sub-portfolio and replacement.It is any
Modifications, equivalent substitutions and improvements made within the spirit and principles in the present invention etc., should be included in the scope of the present invention
Within.
Claims (20)
1. the characteristic coefficient training method that a kind of logic-based is returned, it is characterised in that including:
Label data and service feature data are obtained, the service feature data are normalized;
Logic Regression Models are determined, by the service feature data input after the label data and the normalized described in
Logic Regression Models;
The characteristic coefficient of the Logic Regression Models is determined by iteration optimization algorithms.
2. according to the method described in claim 1, it is characterised in that the iteration optimization algorithms are L-BFGS algorithms or random
Gradient method.
3. according to the method described in claim 1, it is characterised in that bag is normalized to the service feature data
Include:The overdue data of the service feature data are rejected, place is normalized to rejecting the service feature data after overdue data
Reason.
4. according to the method described in claim 1, it is characterised in that by the industry after the label data and the normalized
Business characteristic is input to the Logic Regression Models according to preset format, and the form includes:Label and at least one set of business
Characteristic ID and the corresponding characteristic values of service feature ID, the characteristic value are the service feature data after normalized.
5. according to any described methods of claim 1-4, it is characterised in that methods described also includes:To the logistic regression
Model carries out regularization constraint.
6. according to any described methods of claim 1-4, it is characterised in that methods described also includes:The logic is set to return
Return the evaluation index of model, one group of characteristic coefficient is selected by way of cross validation as the final of the Logic Regression Models
Characteristic coefficient.
7. method according to claim 6, it is characterised in that one group of characteristic coefficient work is selected by way of cross validation
Include for the optimal characteristics coefficient of the Logic Regression Models:Business after the label data and the normalized is special
Levy data be grouped obtaining multigroup checking subset data, every group of checking subset data is separately input to the logistic regression mould
Type obtains multigroup characteristic coefficient and prediction scoring probability, and every group of checking subset is calculated according to prediction scoring probability and label data
The evaluation index of data, asks for the average value of all packet evaluation indexes to obtain the assessment performance of the Logic Regression Models,
Final characteristic coefficient is selected according to assessment performance;The evaluation index includes accuracy rate, recall ratio, precision ratio, F values and AUC.
8. according to any described methods of claim 1-4, it is characterised in that the acquisition label data and service feature number
According to the service feature data, which are normalized after step, also to be included:To the label data and the normalization
Service feature data after processing remove noise data by way of the movement of lack sampling, over-sampling or threshold value.
9. according to any described methods of claim 1-4, it is characterised in that methods described also includes:Iteration optimization will be passed through
Algorithm determines that the Logic Regression Models after characteristic coefficient subscribe into offline task to automate renewal characteristic coefficient.
10. the characteristic coefficient trainer that a kind of logic-based is returned, it is characterised in that including:
The service feature data, for obtaining label data and service feature data, are carried out normalizing by data preprocessing module
Change is handled;
Logic Regression Models module, for determining Logic Regression Models, after the label data and the normalized
Service feature data input is to the Logic Regression Models;
Characteristic coefficient determining module, the characteristic coefficient for determining the Logic Regression Models by iteration optimization algorithms.
11. device according to claim 10, it is characterised in that the iteration optimization algorithms be L-BFGS algorithms or with
Machine gradient method.
12. device according to claim 10, it is characterised in that the data preprocessing module is additionally operable to:Reject described
The overdue data of service feature data, are normalized to rejecting the service feature data after overdue data.
13. device according to claim 10, it is characterised in that the Logic Regression Models module is additionally operable to:Will be described
Service feature data after label data and the normalized are input to the Logic Regression Models, institute according to preset format
Stating form includes:Label characteristic value corresponding with least one set service feature ID and service feature ID, the characteristic value is to return
Service feature data after one change processing.
14. according to any described devices of claim 10-13, it is characterised in that described device also includes:Regularization constraint mould
Block, for carrying out regularization constraint to the Logic Regression Models.
15. according to any described devices of claim 10-13, it is characterised in that described device also includes:Characteristic coefficient is selected
Module, the evaluation index for setting the Logic Regression Models selects one group of characteristic coefficient work by way of cross validation
For the final characteristic coefficient of the Logic Regression Models.
16. device according to claim 15, it is characterised in that the characteristic coefficient selecting module is additionally operable to:Will be described
Service feature data after label data and the normalized be grouped obtaining multigroup checking subset data, and every group is tested
Card subset data is separately input to the Logic Regression Models and obtains multigroup characteristic coefficient and prediction scoring probability, according to prediction
Scoring probability and label data calculate the evaluation index of every group of checking subset data, ask for the average value of all packet evaluation indexes
To obtain the assessment performance of the Logic Regression Models, final characteristic coefficient is selected according to assessment performance;The evaluation index bag
Include accuracy rate, recall ratio, precision ratio, F values and AUC.
17. according to any described devices of claim 10-13, it is characterised in that described device also includes:Noise data is removed
Module, for the service feature data after the label data and the normalized by lack sampling, over-sampling or
The mode of threshold value movement removes noise data.
18. according to any described devices of claim 10-13, it is characterised in that described device also includes:Coefficient of automation is more
New module, for will determine that the Logic Regression Models after characteristic coefficient subscribe into offline task with automatic by iteration optimization algorithms
Change and update characteristic coefficient.
19. a kind of electronic equipment, it is characterised in that including:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are by one or more of computing devices so that one or more of processors are real
The existing method as described in any in claim 1-9.
20. a kind of computer-readable medium, is stored thereon with computer program, it is characterised in that described program is held by processor
The method as described in any in claim 1-9 is realized during row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710398250.9A CN107220217A (en) | 2017-05-31 | 2017-05-31 | Characteristic coefficient training method and device that logic-based is returned |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710398250.9A CN107220217A (en) | 2017-05-31 | 2017-05-31 | Characteristic coefficient training method and device that logic-based is returned |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107220217A true CN107220217A (en) | 2017-09-29 |
Family
ID=59946924
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710398250.9A Pending CN107220217A (en) | 2017-05-31 | 2017-05-31 | Characteristic coefficient training method and device that logic-based is returned |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107220217A (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108108912A (en) * | 2018-01-10 | 2018-06-01 | 百度在线网络技术(北京)有限公司 | Method of discrimination, device, server and the storage medium of interactive low quality user |
CN108932530A (en) * | 2018-06-29 | 2018-12-04 | 新华三大数据技术有限公司 | The construction method and device of label system |
CN110019163A (en) * | 2017-12-05 | 2019-07-16 | 北京京东尚科信息技术有限公司 | Method, system, equipment and the storage medium of prediction, the recommendation of characteristics of objects |
CN110428270A (en) * | 2019-08-07 | 2019-11-08 | 佰聆数据股份有限公司 | The potential preference client recognition methods of the channel of logic-based regression algorithm |
CN110709863A (en) * | 2019-01-11 | 2020-01-17 | 阿里巴巴集团控股有限公司 | Logistic regression modeling scheme using secret sharing |
CN110706822A (en) * | 2019-09-20 | 2020-01-17 | 上海派拉软件股份有限公司 | Health management method based on logistic regression model and decision tree model |
CN110888857A (en) * | 2019-10-14 | 2020-03-17 | 平安科技(深圳)有限公司 | Data label generation method, device, terminal and medium based on neural network |
CN111008321A (en) * | 2019-11-18 | 2020-04-14 | 广东技术师范大学 | Recommendation method and device based on logistic regression, computing equipment and readable storage medium |
CN111191789A (en) * | 2020-01-20 | 2020-05-22 | 上海依图网络科技有限公司 | Model training method, system, chip, electronic device and medium |
CN111325280A (en) * | 2020-02-27 | 2020-06-23 | 苏宁云计算有限公司 | Label generation method and system |
CN111405583A (en) * | 2019-01-02 | 2020-07-10 | 中国移动通信有限公司研究院 | Tidal effect avoiding method and device and computer readable storage medium |
CN111753386A (en) * | 2019-03-11 | 2020-10-09 | 北京嘀嘀无限科技发展有限公司 | Data processing method and device |
CN111914995A (en) * | 2020-06-18 | 2020-11-10 | 北京百度网讯科技有限公司 | Regularized linear regression generation method and device, electronic equipment and storage medium |
CN111966473A (en) * | 2020-07-24 | 2020-11-20 | 支付宝(杭州)信息技术有限公司 | Operation method and device of linear regression task and electronic equipment |
CN112508044A (en) * | 2019-09-16 | 2021-03-16 | 华为技术有限公司 | Artificial intelligence AI model evaluation method, system and equipment |
CN112785256A (en) * | 2021-01-14 | 2021-05-11 | 田进伟 | Real-time assessment method and system for clinical endpoint events in clinical trials |
CN112835910A (en) * | 2021-03-05 | 2021-05-25 | 天九共享网络科技集团有限公司 | Enterprise information and policy information processing method and device |
CN113380417A (en) * | 2021-06-17 | 2021-09-10 | 哈尔滨理工大学 | LR-N based cardiovascular disease prediction method |
CN113535444A (en) * | 2020-04-14 | 2021-10-22 | 中国移动通信集团浙江有限公司 | Transaction detection method, transaction detection device, computing equipment and computer storage medium |
CN113568739A (en) * | 2021-07-12 | 2021-10-29 | 北京淇瑀信息科技有限公司 | User resource limit distribution method and device and electronic equipment |
CN113781134A (en) * | 2020-07-28 | 2021-12-10 | 北京沃东天骏信息技术有限公司 | Item recommendation method and device and computer-readable storage medium |
CN113988374A (en) * | 2021-09-27 | 2022-01-28 | 上海东普信息科技有限公司 | Method, device, equipment and storage medium for identifying high-quality user through targeted recommendation |
CN114783007A (en) * | 2022-06-22 | 2022-07-22 | 成都新希望金融信息有限公司 | Equipment fingerprint identification method and device and electronic equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101853470A (en) * | 2010-05-28 | 2010-10-06 | 浙江大学 | A Collaborative Filtering Method Based on Social Tags |
WO2012094516A1 (en) * | 2011-01-06 | 2012-07-12 | Ebay Inc. | Interestingness recommendations in a computing advice facility |
WO2012103290A1 (en) * | 2011-01-26 | 2012-08-02 | Google Inc. | Dynamic predictive modeling platform |
US20140379519A1 (en) * | 2013-06-25 | 2014-12-25 | Texas Instruments Incorporated | E-commerce cross-sampling product recommender based on statistics |
CN105045819A (en) * | 2015-06-26 | 2015-11-11 | 深圳市腾讯计算机系统有限公司 | Model training method and device for training data |
CN105469263A (en) * | 2014-09-24 | 2016-04-06 | 阿里巴巴集团控股有限公司 | Commodity recommendation method and device |
CN106022865A (en) * | 2016-05-10 | 2016-10-12 | 江苏大学 | Goods recommendation method based on scores and user behaviors |
-
2017
- 2017-05-31 CN CN201710398250.9A patent/CN107220217A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101853470A (en) * | 2010-05-28 | 2010-10-06 | 浙江大学 | A Collaborative Filtering Method Based on Social Tags |
WO2012094516A1 (en) * | 2011-01-06 | 2012-07-12 | Ebay Inc. | Interestingness recommendations in a computing advice facility |
WO2012103290A1 (en) * | 2011-01-26 | 2012-08-02 | Google Inc. | Dynamic predictive modeling platform |
US20140379519A1 (en) * | 2013-06-25 | 2014-12-25 | Texas Instruments Incorporated | E-commerce cross-sampling product recommender based on statistics |
CN105469263A (en) * | 2014-09-24 | 2016-04-06 | 阿里巴巴集团控股有限公司 | Commodity recommendation method and device |
CN105045819A (en) * | 2015-06-26 | 2015-11-11 | 深圳市腾讯计算机系统有限公司 | Model training method and device for training data |
CN106022865A (en) * | 2016-05-10 | 2016-10-12 | 江苏大学 | Goods recommendation method based on scores and user behaviors |
Non-Patent Citations (6)
Title |
---|
佚名: "机器学习之模型选择(交叉验证)", 《HTTPS://BLOG.CSDN.NET/U014365862/ARTICLE/DETAILS/47903819》 * |
佚名: "逻辑回归 - 理论篇", 《HTTPS://BLOG.CSDN.NET/PAKKO/ARTICLE/DETAILS/37878837?DEPTH_1-UTM_SOURCE=DISTRIBUTE.PC_RELEVANT.NONE-TASK-BLOG-BLOGCOMMENDFROMBAIDU-3&UT…》 * |
周越: "数据嗨客|第6期:不平衡数据处理", 《HTTPS://MP.WEIXIN.QQ.COM/S?__BIZ=MZAWMZIXMJIYMG==&MID=2651005812&IDX=1&SN=B9819F04CB2EE9AF21F4011D34013824&SCENE=0》 * |
涂新辉: "《基于概念的信息检索方法》", 30 April 2015, 华中师范大学出版社 * |
董学辉: "逻辑回归算法及其GPU并行实现研究", 《中国优秀硕士学位论文全文数据库-信息科技辑》 * |
陈峰等: "《医用多元统计分析方法》", 31 December 2000, 中国统计出版社 * |
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110019163A (en) * | 2017-12-05 | 2019-07-16 | 北京京东尚科信息技术有限公司 | Method, system, equipment and the storage medium of prediction, the recommendation of characteristics of objects |
CN108108912A (en) * | 2018-01-10 | 2018-06-01 | 百度在线网络技术(北京)有限公司 | Method of discrimination, device, server and the storage medium of interactive low quality user |
CN108932530A (en) * | 2018-06-29 | 2018-12-04 | 新华三大数据技术有限公司 | The construction method and device of label system |
CN111405583A (en) * | 2019-01-02 | 2020-07-10 | 中国移动通信有限公司研究院 | Tidal effect avoiding method and device and computer readable storage medium |
CN110709863A (en) * | 2019-01-11 | 2020-01-17 | 阿里巴巴集团控股有限公司 | Logistic regression modeling scheme using secret sharing |
CN110709863B (en) * | 2019-01-11 | 2024-02-06 | 创新先进技术有限公司 | Logistic regression modeling method, storage medium, and system using secret sharing |
CN111753386B (en) * | 2019-03-11 | 2024-03-26 | 北京嘀嘀无限科技发展有限公司 | Data processing method and device |
CN111753386A (en) * | 2019-03-11 | 2020-10-09 | 北京嘀嘀无限科技发展有限公司 | Data processing method and device |
CN110428270A (en) * | 2019-08-07 | 2019-11-08 | 佰聆数据股份有限公司 | The potential preference client recognition methods of the channel of logic-based regression algorithm |
CN112508044A (en) * | 2019-09-16 | 2021-03-16 | 华为技术有限公司 | Artificial intelligence AI model evaluation method, system and equipment |
CN110706822A (en) * | 2019-09-20 | 2020-01-17 | 上海派拉软件股份有限公司 | Health management method based on logistic regression model and decision tree model |
CN110706822B (en) * | 2019-09-20 | 2024-02-02 | 上海派拉软件股份有限公司 | Health management method based on logistic regression model and decision tree model |
CN110888857A (en) * | 2019-10-14 | 2020-03-17 | 平安科技(深圳)有限公司 | Data label generation method, device, terminal and medium based on neural network |
WO2021073152A1 (en) * | 2019-10-14 | 2021-04-22 | 平安科技(深圳)有限公司 | Data label generation method and apparatus based on neural network, and terminal and medium |
CN110888857B (en) * | 2019-10-14 | 2023-11-07 | 平安科技(深圳)有限公司 | Data tag generation method, device, terminal and medium based on neural network |
CN111008321A (en) * | 2019-11-18 | 2020-04-14 | 广东技术师范大学 | Recommendation method and device based on logistic regression, computing equipment and readable storage medium |
CN111008321B (en) * | 2019-11-18 | 2023-08-29 | 广东技术师范大学 | Recommendation method, device, computing device, and readable storage medium based on logistic regression |
CN111191789A (en) * | 2020-01-20 | 2020-05-22 | 上海依图网络科技有限公司 | Model training method, system, chip, electronic device and medium |
CN111191789B (en) * | 2020-01-20 | 2023-11-28 | 上海依图网络科技有限公司 | Model optimization deployment system, chip, electronic equipment and medium |
CN111325280A (en) * | 2020-02-27 | 2020-06-23 | 苏宁云计算有限公司 | Label generation method and system |
CN113535444B (en) * | 2020-04-14 | 2023-11-03 | 中国移动通信集团浙江有限公司 | Abnormal motion detection method, device, computing equipment and computer storage medium |
CN113535444A (en) * | 2020-04-14 | 2021-10-22 | 中国移动通信集团浙江有限公司 | Transaction detection method, transaction detection device, computing equipment and computer storage medium |
CN111914995A (en) * | 2020-06-18 | 2020-11-10 | 北京百度网讯科技有限公司 | Regularized linear regression generation method and device, electronic equipment and storage medium |
CN111966473B (en) * | 2020-07-24 | 2024-02-06 | 支付宝(杭州)信息技术有限公司 | Operation method and device of linear regression task and electronic equipment |
CN111966473A (en) * | 2020-07-24 | 2020-11-20 | 支付宝(杭州)信息技术有限公司 | Operation method and device of linear regression task and electronic equipment |
CN113781134A (en) * | 2020-07-28 | 2021-12-10 | 北京沃东天骏信息技术有限公司 | Item recommendation method and device and computer-readable storage medium |
CN112785256A (en) * | 2021-01-14 | 2021-05-11 | 田进伟 | Real-time assessment method and system for clinical endpoint events in clinical trials |
CN112835910B (en) * | 2021-03-05 | 2023-10-17 | 天九共享网络科技集团有限公司 | Method and device for processing enterprise information and policy information |
CN112835910A (en) * | 2021-03-05 | 2021-05-25 | 天九共享网络科技集团有限公司 | Enterprise information and policy information processing method and device |
CN113380417A (en) * | 2021-06-17 | 2021-09-10 | 哈尔滨理工大学 | LR-N based cardiovascular disease prediction method |
CN113568739A (en) * | 2021-07-12 | 2021-10-29 | 北京淇瑀信息科技有限公司 | User resource limit distribution method and device and electronic equipment |
CN113988374A (en) * | 2021-09-27 | 2022-01-28 | 上海东普信息科技有限公司 | Method, device, equipment and storage medium for identifying high-quality user through targeted recommendation |
CN114783007A (en) * | 2022-06-22 | 2022-07-22 | 成都新希望金融信息有限公司 | Equipment fingerprint identification method and device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107220217A (en) | Characteristic coefficient training method and device that logic-based is returned | |
CN110995459B (en) | Abnormal object identification method, device, medium and electronic equipment | |
CN109919684A (en) | For generating method, electronic equipment and the computer readable storage medium of information prediction model | |
CN111080338B (en) | User data processing method and device, electronic equipment and storage medium | |
CN106095942B (en) | Strong variable extracting method and device | |
CN107679946A (en) | Fund Products Show method, apparatus, terminal device and storage medium | |
US12079748B2 (en) | Co-operative resource pooling system | |
CN113344700B (en) | Multi-objective optimization-based wind control model construction method and device and electronic equipment | |
CN114240555A (en) | Method and device for training click-through rate prediction model and predicting click-through rate | |
CN113674087A (en) | Enterprise credit rating method, apparatus, electronic device and medium | |
CN115423040A (en) | User portrait recognition method and AI system for interactive marketing platform | |
US11810022B2 (en) | Contact center call volume prediction | |
CN112860672A (en) | Method and device for determining label weight | |
CN109800138B (en) | CPU testing method, electronic device and storage medium | |
CN114707733A (en) | Risk indicator prediction method and device, electronic equipment and storage medium | |
CN113496236B (en) | User tag information determining method, device, equipment and storage medium | |
CN113743906A (en) | Method and device for determining service processing strategy | |
US20240211973A1 (en) | Technology stack modeler engine for a platform signal modeler | |
CN110796381B (en) | Modeling method and device for wind control model, terminal equipment and medium | |
CN117874594A (en) | Bank customer classification method, system and electronic equipment | |
CN116681467A (en) | Sales predicting method based on improved WOA algorithm and related equipment thereof | |
KR102284440B1 (en) | Method to broker deep learning model transactions perfomed by deep learning model transaction brokerage servers | |
CN110956528B (en) | Recommendation method and system for e-commerce platform | |
CN110414709A (en) | Debt risk intelligent Forecasting, device and computer readable storage medium | |
CN118656685B (en) | A derivative feature extraction method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170929 |
|
RJ01 | Rejection of invention patent application after publication |