CN107220217A - Characteristic coefficient training method and device that logic-based is returned - Google Patents

Characteristic coefficient training method and device that logic-based is returned Download PDF

Info

Publication number
CN107220217A
CN107220217A CN201710398250.9A CN201710398250A CN107220217A CN 107220217 A CN107220217 A CN 107220217A CN 201710398250 A CN201710398250 A CN 201710398250A CN 107220217 A CN107220217 A CN 107220217A
Authority
CN
China
Prior art keywords
data
service feature
regression models
characteristic coefficient
logic regression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710398250.9A
Other languages
Chinese (zh)
Inventor
王颖帅
李晓霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201710398250.9A priority Critical patent/CN107220217A/en
Publication of CN107220217A publication Critical patent/CN107220217A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Operations Research (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the present invention provides characteristic coefficient training method and the device that a kind of logic-based is returned, and with reference to Logic Regression Models and iteration optimization algorithms, is more suitable for the big data background of electric business recommendation, quickly and accurately can calculate characteristic coefficient for each service feature.The characteristic coefficient training method that a kind of logic-based of the embodiment of the present invention is returned, including:Label data and service feature data are obtained, the service feature data are normalized;Logic Regression Models are determined, by the service feature data input after the label data and the normalized to the Logic Regression Models;The characteristic coefficient of the Logic Regression Models is determined by iteration optimization algorithms.

Description

Characteristic coefficient training method and device that logic-based is returned
Technical field
The present invention relates to the characteristic coefficient training method and dress of computer realm, more particularly to a kind of recurrence of logic-based Put.
Background technology
With the development in epoch, shopping online has changed into a very important part in people's daily life.User exists During net purchase, various structurings and non-structured behavioral data can be left in electric business website, such as order behavior, browse row For, concern the behavioral data such as behavior and click behavior, these behavioral datas can serve as the service feature of training pattern, with structure The algorithm of machine learning is built, so as to predict the personalized commercial preference of user.The accurate personalized recommendation in " face of thousand people thousand ", always It is the target that the personalized recommendation system of major electric business is pursued, it is desirable to which the personalized recommendation effect obtained is calculating of user Property preference-score when, it is necessary to distribute rational characteristic coefficient to each service feature.So, a kind of accurately and effectively feature system Number automation more new algorithm, for the personalized recommendation of commodity, lifts Consumer's Experience, intelligent platform service, all with great Meaning.
Each service feature to training pattern determines that characteristic coefficient mainly has two ways in the prior art:One is ABtest (A/B tests) regulation coefficient method, this method is that Data Analyst adjusts feature according to business experience using ABtest Coefficient;Two be random sampling by statistical software design factor method, then this method is borrowed by randomly selecting low volume data sample Help statistic software SPSS (Statistical Product and Service Solutions, statistical product and service solution party Case) or R calculate the characteristic coefficient of service feature.
In process of the present invention is realized, inventor has found that at least there are the following problems in the prior art:
(1) ABtest regulation coefficients method relies on larger to the business experience of Data Analyst, when needing to take over a new industry When business, this method is possible to that very multi-group data can be tested, and can just provide the rational characteristic coefficient of comparison, waste of resource.
(2) random sampling is the random sampling on large data sets by statistical software design factor method, because can enter system The number of data that meter software SPSS or R do logistic regression offline is conditional, and this method is to estimate all with sample, can band There is random error.Secondly, this method Data Analyst often does a logistic regression and just secures characteristic coefficient, characteristic coefficient it is dynamic State updates not in time, causes personalized ventilation system accurate in time.
The content of the invention
In view of this, the embodiment of the present invention provides characteristic coefficient training method and the device that a kind of logic-based is returned, knot Logical regression model and iteration optimization algorithms, are more suitable for the big data background of electric business recommendation, can be quickly and accurately each Service feature calculates characteristic coefficient.
To achieve the above object, there is provided the spy that a kind of logic-based is returned for one side according to embodiments of the present invention Levy coefficient training method.
The characteristic coefficient training method that a kind of logic-based of the embodiment of the present invention is returned, including:Obtain label data and The service feature data are normalized by service feature data;Logic Regression Models are determined, by the label data With the service feature data input after the normalized to the Logic Regression Models;Institute is determined by iteration optimization algorithms State the characteristic coefficient of Logic Regression Models.
Alternatively, the iteration optimization algorithms are L-BFGS algorithms or Stochastic gradient method.
Alternatively, the service feature data are normalized including:Reject the mistake of the service feature data Point data, is normalized to rejecting the service feature data after overdue data.
Alternatively, the service feature data after the label data and the normalized are inputted according to preset format To the Logic Regression Models, the form includes:Label is corresponding with least one set service feature ID and service feature ID Characteristic value, the characteristic value is the service feature data after normalized.
Alternatively, methods described also includes:Regularization constraint is carried out to the Logic Regression Models.
Alternatively, methods described also includes:The evaluation index of the Logic Regression Models is set, passes through the side of cross validation Formula selects one group of characteristic coefficient as the final characteristic coefficient of the Logic Regression Models.
Alternatively, one group of characteristic coefficient is selected by way of cross validation as the optimal spy of the Logic Regression Models Levying coefficient includes:Service feature data after the label data and the normalized be grouped to obtain multigroup checking Subset data, is separately input to the Logic Regression Models by every group of checking subset data and obtains multigroup characteristic coefficient and prediction Scoring probability, the evaluation index of every group of checking subset data is calculated according to prediction scoring probability and label data, all points are asked for The average value of group evaluation index selects final feature system to obtain the assessment performance of the Logic Regression Models according to assessment performance Number;The evaluation index includes accuracy rate, recall ratio, precision ratio, F values and AUC.
Alternatively, place is normalized in the service feature data by the acquisition label data and service feature data Also include after reason step:Lack sampling, mistake are passed through to the service feature data after the label data and the normalized The mode of sampling or threshold value movement removes noise data.
Alternatively, methods described also includes:The Logic Regression Models after characteristic coefficient will be determined by iteration optimization algorithms Offline task is subscribed into automate renewal characteristic coefficient.
To achieve the above object, there is provided the spy that a kind of logic-based is returned for another aspect according to embodiments of the present invention Levy coefficient trainer.
The characteristic coefficient trainer that a kind of logic-based of the embodiment of the present invention is returned, including:Data preprocessing module, For obtaining label data and service feature data, the service feature data are normalized;Logic Regression Models Module, for determining Logic Regression Models, by the service feature data input after the label data and the normalized To the Logic Regression Models;Characteristic coefficient determining module, for determining the Logic Regression Models by iteration optimization algorithms Characteristic coefficient.
The iteration optimization algorithms are L-BFGS algorithms or Stochastic gradient method.
Alternatively, the data preprocessing module is additionally operable to:The overdue data of the service feature data are rejected, to rejecting Service feature data after overdue data are normalized.
Alternatively, the Logic Regression Models module is additionally operable to:After the label data and the normalized Service feature data are input to the Logic Regression Models according to preset format, and the form includes:Label and at least one set of industry Characteristic ID of being engaged in and the corresponding characteristic values of service feature ID, the characteristic value are the service feature data after normalized.
Alternatively, described device also includes:Regularization constraint module, for carrying out regularization to the Logic Regression Models Constraint.
Alternatively, described device also includes:Characteristic coefficient selecting module, the assessment for setting the Logic Regression Models Index, selects one group of characteristic coefficient as the final characteristic coefficient of the Logic Regression Models by way of cross validation.
Alternatively, the characteristic coefficient selecting module is additionally operable to:After the label data and the normalized Service feature data be grouped obtaining multigroup checking subset data, and every group of checking subset data is separately input into the logic Regression model obtains multigroup characteristic coefficient and prediction scoring probability, and calculating every group according to prediction scoring probability and label data tests The evaluation index of subset data is demonstrate,proved, asks for the average value of all packet evaluation indexes to obtain the assessment of the Logic Regression Models Performance, final characteristic coefficient is selected according to assessment performance;The evaluation index include accuracy rate, recall ratio, precision ratio, F values and AUC。
Alternatively, described device also includes:Noise data removes module, for the label data and the normalization Service feature data after processing remove noise data by way of the movement of lack sampling, over-sampling or threshold value.
Alternatively, described device also includes:Coefficient of automation update module, for spy will to be determined by iteration optimization algorithms The Logic Regression Models after coefficient are levied to subscribe into offline task to automate renewal characteristic coefficient.
To achieve the above object, there is provided a kind of electronic equipment for another further aspect according to embodiments of the present invention.
The a kind of electronic equipment of the embodiment of the present invention, including:One or more processors;Storage device, for storing one Individual or multiple programs, when one or more of programs are by one or more of computing devices so that one or many Individual processor realizes the characteristic coefficient training method that a kind of logic-based of the embodiment of the present invention is returned.
To achieve the above object, there is provided a kind of computer-readable medium for another further aspect according to embodiments of the present invention.
A kind of computer-readable medium of the embodiment of the present invention, is stored thereon with computer program, and described program is processed Device realizes the characteristic coefficient training method that a kind of logic-based of the embodiment of the present invention is returned when performing.
One embodiment in technique according to the invention scheme, foregoing invention has the following advantages that or beneficial effect:This Invention combines Logic Regression Models and iteration optimization algorithms on large data sets, can utilize more valid data, accurate fast Fast calculates characteristic coefficient for each service feature;By calling L-BFGS algorithms to run calling program in Spark platforms On large data sets, meet the trend of current era big data machine learning, Optimized Iterative algorithm is more healthy and stronger reliable, feature system Several Optimized Iterative convergence rates are faster;The automatic of characteristic coefficient is realized by the way that Logic Regression Models are subscribed into offline task Change and update, both liberated the part work of Data Analyst, and economized on resources, the individual character of user can be timely and accurately found again Change preference, automatically update corresponding characteristic coefficient according to the newest Shopping Behaviors of user, improve Consumer's Experience, to electric business platform More intelligent personalized recommendation system is built to be significant.
The further effect that above-mentioned non-usual optional mode has adds hereinafter in conjunction with embodiment With explanation.
Brief description of the drawings
Accompanying drawing is used to more fully understand the present invention, does not constitute inappropriate limitation of the present invention.Wherein:
Fig. 1 is the schematic diagram of the key step of characteristic coefficient training method according to embodiments of the present invention;
Fig. 2 is the execution schematic flow sheet of characteristic coefficient training method according to embodiments of the present invention;
Fig. 3 is that the characteristic coefficient training method of the embodiment of the present invention is input to the schematic diagram data of Logic Regression Models;
Fig. 4 is the characteristic coefficient training result schematic diagram of the characteristic coefficient training method of the embodiment of the present invention;
Fig. 5 is the composition schematic diagram of characteristic coefficient trainer according to embodiments of the present invention;
Fig. 6 applies to the structural representation of the computer system of the electronic equipment to realize the embodiment of the present invention.
Embodiment
The one exemplary embodiment of the present invention is explained below in conjunction with accompanying drawing, including the various of the embodiment of the present invention Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize Arrive, various changes and modifications can be made to the embodiments described herein, without departing from scope and spirit of the present invention.Together Sample, for clarity and conciseness, eliminates the description to known function and structure in following description.
In the embodiment of the present invention, the characteristic coefficient training method before improving, realize a kind of logic-based return and The solution of the characteristic coefficient training of iteration optimization algorithms.Prediction user is needed in embodiments of the invention to three-level category Preference-score, realization approach is:From Hive tables, (Hive is a Tool for Data Warehouse based on Hadoop, can be by structuring Data file be mapped as a database table) in obtain order record, concern record, browse record, click on record label Data and service feature data, are divided into three parts by whole label data and service feature data, such as are divided into 8:1:1, Wherein 8 parts of data are used for doing training set, and 1 part of data are used for making checking collection, and 1 part of data are used for doing test set, are first Trained with the data of training set in Logic Regression Models, training process by the way that in Spark platforms, (Spark platforms are a kind of collection Group computing environment) MLlib storehouses (Machine Learning Library, machine learning algorithm storehouse) in call iteration optimization to calculate Method, obtains local optimum parameter;Then the data input collected will be verified to the Logic Regression Models trained, according to evaluation index Select optimal characteristics coefficient;Finally the Logic Regression Models come will be trained in the data input of test set to training set data, Obtain probability preference-score value of the user to three-level category.Wherein, incoming traffic characteristic, test set are only needed in test set In label data predict come.The existing label data for predicting and is concentrated in checking, also there is real label data.
Fig. 1 is the schematic diagram of the key step of the characteristic coefficient training method of the embodiment of the present invention.
As shown in figure 1, the characteristic coefficient training method of the embodiment of the present invention mainly comprises the following steps:
Step S11:Label data and service feature data are obtained, the service feature data are normalized. Label data and service feature data are obtained from Hive tables, in order that characteristic under same dimension, it is necessary to acquisition Service feature data are normalized.The overdue data in Hive tables can be first rejected when data are normalized, Such as order is more than 100,000,000 user.Wherein, label is the target of Logic Regression Models training, and label configurations mode is for difference Business be different, such as carrying out user in the prediction of the preference of three-level category, label can be beaten as 1 and 0, label is 1 represents user to this three-level category preference, and label is 0 expression user to this three-level category not preference or other Construct the mode of label.
After step S11 completes data prediction, design and the feature system of Logic Regression Models are proceeded by from step S12 Several determinations.
Step S12:Logic Regression Models are determined, by the service feature number after the label data and the normalized According to being input to the Logic Regression Models.Logistic regression is a kind of generalized linear regression, is added on the basis of linear regression Sigmoid functions carry out Nonlinear Mapping, and successive value can be mapped on 0 and 1 by this function.The dependent variable of logistic regression Can be two classification can also be it is polytypic, conventional is exactly two logistic regressions classified in practice.Determine logistic regression The disaggregated model of machine learning is defined as Logic Regression Models by model.Business after label data and the normalized The data format that characteristic need to be supported according to iteration optimization algorithms is input in Logic Regression Models.
Step S13:The characteristic coefficient of the Logic Regression Models is determined by iteration optimization algorithms.Iteration optimization algorithms can To be that L-BFGS algorithms can also be SGD algorithms (Stochastic gradient method).L-BFGS algorithms are that a kind of solution is non-linear excellent without constraining The conventional method of change problem, the algorithm has more perfect local convergence theoretical, with its inventor Broyden, Fletcher, Goldfarb and Shanno initial name, L is Limit abbreviation, is that the one kind of BFGS algorithms in limited internal memory is approximate Algorithm, it is advantageous on large data sets.L-BFGS algorithms save resource also than Stochastic gradient method fast convergence rate, in spy Levy preferential from L-BFGS algorithms in coefficient training process, Stochastic gradient method can also be used when data volume is few.
Fig. 2 is the execution schematic flow sheet of the characteristic coefficient training method of the embodiment of the present invention.
As shown in Fig. 2 in the embodiment of the present invention, the specific execution flow of characteristic coefficient training method is:
Order record, concern record, the label data and service feature for browsing record, clicking on record are obtained from Hive tables Data, reject order record, concern record, record, the overdue data clicked in record are browsed, to rejecting the industry after overdue data Business characteristic does normalized.The step needs to get out the industry for the order that the label data of user, user go over 1 year Business characteristic, user go over the service feature browsed that the service feature data of trimestral concern, user are gone over one month Data, user go over the service feature data of the click in a week, in order to allow characteristic under same dimension, it is necessary to right Characteristic does normalized.
Logic Regression Models are determined, the service feature data input after the label data and the normalized is arrived The Logic Regression Models.The loss function of logistic regression uses log-likelihood loss function, and the compression function of outer layer is used Sigmoid functions, Sigmoid functions can by the output area of linear regression from it is negative it is infinite to just it is infinite be compressed to 0 and 1 it Between, a wide range of magnitude compression to the influence of the variable especially stood out in the range of this, can be eliminated, that is, eliminate the exception of data Value, such as king-sized point and especially small point.
The characteristic coefficient of the Logic Regression Models is determined by iteration optimization algorithms.L-BFGS algorithms are applied to distribution Sequence vector s in the calculating of big data, its storage computation processj, yj, according to sjAnd yjCalculate the inverse of Hessian matrix Approximately, while only storing m nearest sj, yj, wherein sequence vector sjCorresponding to the service feature data of the present invention, yjCorrespondence In the label data of the present invention, j is meant that j-th of sample, therefore L-BFGS algorithms save resource, and because of its convergence rate ratio Stochastic gradient method is fast, preferential in characteristic coefficient training method to select L-BFGS quasi-Newton methods.The implementation process of the step is: L-BFGS algorithms are called in the MLlib storehouses of Spark platforms, each parameter of Logic Regression Models is obtained by repeatedly study iteration. Here the parameter obtained includes characteristic coefficient, but this feature coefficient is to determine that the local optimum of a Logic Regression Models is special Levy coefficient.Total optimization characteristic coefficient needs to coordinate evaluation index and cross validation below to obtain.
Wherein, normalization processing method is as follows:
In formula, score refers to the fraction after service feature data normalization, and span is 0 to 1;I is positive integer, table Show the dimension for obtaining data;xiRepresent, when i values are different, to refer to the service feature data of service order, business respectively and pay close attention to Service feature data and click on the service feature data recorded that service feature data, business are browsed.
In addition, if the service feature data and label data sample imbalance that obtain, that is, the label obtained are 1 sample Less, label is more for 0 sample, if directly allowing such service feature data to be trained into Logic Regression Models, Characteristic coefficient trainer can be predicted as more new datas on label 0, but we wish to predict more 1 again, with to use Do abundant personalized recommendation in family.For the sample imbalance problem, it can be moved by lack sampling, over-sampling and threshold value Mode is solved, and the mode of lack sampling, over-sampling and threshold value movement is mode of the prior art.The present invention is with lack sampling Exemplified by mode, it is described in detail:Remove the data that some labels in training set are 0 so that label is 0 and label is 1 Data number is approached, and the present invention uses stratified randon sampling for 0 and ratio that label is 1 data, that is, to be utilized come abstract factory Label is divided into several different set by integrated study mechanism for 0 data, is learnt for different learners, so to every Lack sampling is all employed from the point of view of individual learner, but important information will not be lost from the point of view of the overall situation.Such as:Data can be divided into two Part:The a few sample that many numerical examples and label that label is 0 are 1 respectively, for most labels for 0 sample by n times Put back to, generate n one's share of expenses for a joint undertaking collection, a small number of labels are merged into training a new model with this n parts of sample respectively for 1 sample, so N Logic Regression Models can be obtained, final model is the average value that this n Logic Regression Models predicts the outcome, this think of Think similar decision tree generation random forest.
Wherein, the service feature data needs after the label data of the Logic Regression Models and normalized are input to Inputted according to predetermined format, the form includes:Label spy corresponding with least one set service feature ID and service feature ID Value indicative, the characteristic value is the service feature data after normalized.Embodiments of the invention provide one kind of the form Expression way:
Tagged traffic characteristic ID:Characteristic value service feature ID:Characteristic value ...
In the form, service feature ID:The number of characteristic value is not limited, and is equal to the data dimension number of acquisition, this hair It is 4 in bright embodiment;Characteristic value is equal to the service feature data after normalization.The form is algorithmic tool branch in the present invention The form held, extends to extended formatting.
Fig. 3 is that the characteristic coefficient training method of the embodiment of the present invention is input to the schematic diagram data of Logic Regression Models.
As shown in figure 3, embodiments of the invention obtain 4 dimension datas from Hive tables.First is classified as label, because Embodiment is prediction user preference or non-preference, is two classification problems, so label only has two kinds of situations 0 and 1;Per a line Service feature ID is 1,2,3,4, and order dimension is represented respectively, dimension is paid close attention to, browse dimension and clicks on dimension;Characteristic value is to return Service feature data score after one change.
In addition, the expression formula of Sigmoid functions is:
In formula, x represents feature value vector, is order feature here, browses feature, pays close attention to feature, click feature this four The characteristic value of dimension;θ is the parameter vector of each service feature, corresponds to order feature respectively here, browses feature, concern spy Levy, the characteristic coefficient of click feature;θTX represents two multiplication of vectors, and T represents transposition;G () represents nonlinear mapping function.
In addition, the expression formula of log-likelihood loss function is:
In formula, y represents label, and what y was that 1 expression label beats is that 1, y is that 0 represent that label beats is 0.
Wherein, the process for determining characteristic coefficient by L-BFGS algorithms is:Step1:Preset special during first time iteration The initial value (numCorrection=10) for levying coefficient, the threshold value (convergencetol=1e-4) for stopping iteration, iteration time Number (maxNumIterations=20), initial search direction (negative gradient direction) and the initial step length factor (regParam= 0.1);Step2:Parameter value is substituted into characteristic coefficient when next iteration is obtained in object function, calculated in iterative process The direction of search and iteration step factor;Step3:Judge whether the characteristic coefficient difference of iteration twice meets optimization convergence bar Whether part (is less than the threshold value for stopping iteration), if being unsatisfactory for the optimization condition of convergence continues to iteration, until what is calculated twice Characteristic coefficient difference meets the condition of convergence.If reaching, iterations does not meet the optimization condition of convergence also, stops iteration.Wherein, damage Lose function and take minimum value as object function.
In addition, the step of method of the embodiment of the present invention also includes carrying out regularization constraint to the Logic Regression Models. In order to prevent the over-fitting of Logic Regression Models, increase the robustness and model generalization ability of Logic Regression Models, it is of the invention Embodiment L2 regularization factors are added in object function come weigh Logic Regression Models to sample predictions and true tag it Between error.Optimization objective function=minimum training error+minimum test error, due to Logic Regression Models needs pair Service feature data and label data in training set are fitted, and to be fitted training data as far as possible, but we not only will Ensure that training error is minimum, it is also desirable to which the test error of Logic Regression Models is small, is that constraint logic regression model is tried one's best simply, makes Test error it is small, so be accomplished by add regularization thought.Regularization constraint algorithm is including L1 regularization algorithms and L2 just Then change algorithm, L1 regularization algorithms obey laplacian distribution, L2 regularization algorithm Gaussian distributeds, and user can be according to reality Selection one of which is needed to carry out regularization constraint.
In addition, the method for the embodiment of the present invention also includes the evaluation index for setting the Logic Regression Models, by intersecting The mode of checking selects one group of characteristic coefficient as the final characteristic coefficient of the Logic Regression Models.The assessment that the present invention is used Index is:(Area under roc curve are the one of measurement model quality by accuracy rate, recall ratio, precision ratio, F values and AUC Individual standard).AUC is bigger, and the training effect of Logic Regression Models is better, and commodity sequence is general to select AUC indexs;Classification problem one As select F values, according to different business evaluating indexs, accuracy rate, recall ratio, precision ratio can also be selected.In logistic regression In two classification problems, four kinds of situations occur in predicted value and actual value:In training set, if a sample label be 1 also by 1 is predicted into, TP (True Positive) is remembered into;If sample label is 1 to be predicted to be 0, FP (False are remembered into Positive), if a sample label is 0 to be also predicted to 0, TN (True Negative) is remembered into;If sample label is 0 is predicted to be 1, remembers into FN (False Negative), then above-mentioned several evaluation indexes are respectively:
Wherein, one group of characteristic coefficient is selected by way of cross validation as the optimal characteristics of the Logic Regression Models Coefficient includes:By the service feature data correlation after label data and normalized in a table, in the table after association both There is label data there are service feature data again, data be divided into K groups, do training set from different subset datas and checking collects, The logistic regression mould after K L-GFBS algorithm optimization will can be thus obtained per each subset data respectively as a checking collection Type, with the average value of the evaluation index of the checking collection of the Logic Regression Models after this K L-GFBS algorithm optimization finally as patrolling Collect the Performance Evaluation index of regression model.Such as packet count K is 10, and evaluation index is above-mentioned 5, then obtains 10 checking subsets Data, each checking subset data serves as one-time authentication collection, will do 10 cross validations and obtain 10 groups of characteristic coefficients, finally Best in 10 groups of characteristic coefficients one group is selected as the final feature of Logic Regression Models according to the average value of assessment performance Coefficient;Average value is 10 average values of each evaluation index, finally still obtains 5 evaluation indexes, and this 5 evaluation indexes are more It is representative.
Fig. 4 is the characteristic coefficient training result schematic diagram of the characteristic coefficient training method of the embodiment of the present invention.
As shown in figure 4, in the prediction of user's three-level category preference-score, the four business spies used in the embodiment of the present invention The characteristic coefficient training result levied and (browse feature, concern feature, order feature and click feature) is as schemed, and four numerals are in figure Correspond to the characteristic coefficient of four service feature distribution.
In addition, the method for the embodiment of the present invention also includes:The logic after characteristic coefficient will be determined by iteration optimization algorithms Regression model subscribes into offline task to automate renewal characteristic coefficient.The Spark programmed logics finished writing are placed on big data to put down On platform, according to business demand can start by set date task daily, a subtask is performed daily, such characteristic coefficient is all new everyday , the automation training of Logic Regression Models characteristic coefficient is realized, with flexibility and operability.
The present invention can be seen that by the characteristic coefficient training method of the embodiment of the present invention logic is combined on large data sets Regression model and iteration optimization algorithms, can utilize more valid data, quickly and accurately be calculated for each service feature Characteristic coefficient;By calling L-BFGS algorithms calling program is operated on large data sets in Spark platforms, meet current era The trend of big data machine learning, Optimized Iterative algorithm is more healthy and stronger reliable, and the Optimized Iterative convergence rate of characteristic coefficient is faster; The automation renewal of characteristic coefficient is realized by the way that Logic Regression Models are subscribed into offline task, Data Analyst has both been liberated A part work, economize on resources, the personalization preferences of user can be timely and accurately found again, according to the newest shopping of user Behavior automatically updates corresponding characteristic coefficient, improves Consumer's Experience, the personalized recommendation system more intelligent to electric business platform construction System is significant.
Fig. 5 is that the characteristic coefficient of the embodiment of the present invention trains the composition schematic diagram of trainer.
As shown in figure 5, the trainer 50 of the embodiment of the present invention mainly includes:
The service feature data, for obtaining label data and service feature data, are entered by data preprocessing module 501 Row normalized.Label data and service feature data are obtained from Hive tables, in order that characteristic is under same dimension, Need that the service feature data of acquisition are normalized.Can first it be rejected in Hive tables when data are normalized Overdue data, such as order be more than 100,000,000 user.Wherein, label is the target of Logic Regression Models training, label configurations side Formula is different for different business.Such as in preference prediction of the user to three-level category is carried out, label can be beaten as 1 With 0, label be 1 expression user to this three-level category preference, label is 0 expression user to this three-level category not preference, It can be the mode of other construction labels.
Logic Regression Models module 502, for determining Logic Regression Models, at the label data and the normalization Service feature data input after reason is to the Logic Regression Models.Logistic regression is a kind of generalized linear regression, is linear Sigmoid functions are added on the basis of recurrence and carry out Nonlinear Mapping, successive value can be mapped on 0 and 1 by this function. The dependent variable of logistic regression can be two classification can also be it is polytypic, in practice it is conventional be exactly two classification logics time Return.Determine that the disaggregated model of machine learning is defined as Logic Regression Models by Logic Regression Models.Label data and described return The data format that service feature data after one change processing need to be supported according to iteration optimization algorithms is input in Logic Regression Models.
Characteristic coefficient determining module 503, the feature system for determining the Logic Regression Models by iteration optimization algorithms Number.Iteration optimization algorithms can be that L-BFGS algorithms can also be SGD algorithms (Stochastic gradient method).L-BFGS algorithms are that one kind is asked The method that solution is commonly used without constrained nonlinear systems problem, the algorithm has more perfect local convergence theoretical, with its inventor Broyden, Fletcher, Goldfarb and Shanno initial name, L is Limit abbreviation, is that BFGS algorithms are being limited A kind of approximate data during internal memory, it is advantageous on large data sets.L-BFGS algorithms also than Stochastic gradient method fast convergence rate, and And resource is saved, and it is preferential in characteristic coefficient training process to select L-BFGS algorithms, can also be using random when data volume is few Gradient method.
In addition, the characteristic coefficient trainer of the embodiment of the present invention can also include regularization constraint module, characteristic coefficient Selecting module, noise data remove module and coefficient of automation update module.Regularization constraint module is used to return the logic Return model to carry out regularization constraint, can select L1 regularization algorithms or L2 regularization algorithms.Characteristic coefficient selecting module, is used for The evaluation index of the Logic Regression Models is set, one group of characteristic coefficient is selected by way of cross validation as the logic The final characteristic coefficient of regression model.Noise data removes module, after to the label data and the normalized Service feature data lack sampling, over-sampling or threshold value movement by way of remove noise data, make the instruction of characteristic coefficient Practice result more accurate.Coefficient of automation update module, for the logic after characteristic coefficient will to be determined by iteration optimization algorithms Regression model subscribes into offline task to automate renewal characteristic coefficient.
From the above, it can be seen that the present invention combines Logic Regression Models and iteration optimization algorithms on large data sets, More valid data can be utilized, quickly and accurately characteristic coefficient are calculated for each service feature;By in Spark platforms In call L-BFGS algorithms calling program is operated on large data sets, meet the trend of current era big data machine learning, it is excellent Change iterative algorithm more healthy and stronger reliable, the Optimized Iterative convergence rate of characteristic coefficient is faster;By the way that Logic Regression Models are subscribed to The automation for realizing characteristic coefficient into offline task updates, and had both liberated the part work of Data Analyst, and had economized on resources, The personalization preferences of user can be timely and accurately found again, and corresponding feature is automatically updated according to the newest Shopping Behaviors of user Coefficient, improves Consumer's Experience, and the personalized recommendation system more intelligent to electric business platform construction is significant.
Embodiments in accordance with the present invention, present invention also offers a kind of electronic equipment and a kind of computer-readable medium.
The electronic equipment of the present invention includes:One or more processors;Storage device, for storing one or more journeys Sequence, when one or more of programs are by one or more of computing devices so that one or more of processors are real A kind of characteristic coefficient training method of the existing embodiment of the present invention.
The computer-readable medium of the present invention, is stored thereon with computer program, real when described program is executed by processor A kind of characteristic coefficient training method of the existing embodiment of the present invention.
Below with reference to Fig. 6, it illustrates suitable for come the computer system 600 of the electronic equipment of realizing the embodiment of the present invention Structural representation.Electronic equipment shown in Fig. 6 is only an example, to the function of the embodiment of the present invention and should not use model Shroud carrys out any limitation.
As shown in fig. 6, computer system 600 includes CPU (CPU) 601, it can be read-only according to being stored in Program in memory (ROM) 602 or be loaded into program in random access storage device (RAM) 603 from storage part 608 and Perform various appropriate actions and processing.In RAM 603, the computer system that is also stored with 600 operates required various programs And data.CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 It is connected to bus 604.
I/O interfaces 605 are connected to lower component:Importation 606 including keyboard, mouse etc.;Penetrated including such as negative electrode The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage part 608 including hard disk etc.; And the communications portion 609 of the NIC including LAN card, modem etc..Communications portion 609 via such as because The network of spy's net performs communication process.Driver 610 is also according to needing to be connected to I/O interfaces 605.Detachable media 611, such as Disk, CD, magneto-optic disk, semiconductor memory etc., are arranged on driver 610, in order to read from it as needed Computer program be mounted into as needed storage part 608.
Especially, according to embodiment disclosed by the invention, the process of key step figure description above may be implemented as meter Calculation machine software program.For example, embodiment of the disclosure includes a kind of computer program product, it includes being carried on computer-readable Computer program on medium, the computer program includes the program code for being used for performing the method shown in key step figure. In such embodiment, the computer program can be downloaded and installed by communications portion 609 from network, and/or from can Medium 611 is dismantled to be mounted.When the computer program is performed by CPU (CPU) 601, the system for performing the present invention The above-mentioned functions of middle restriction.
It should be noted that the computer-readable medium shown in the present invention can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer-readable recording medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, system, device or the device of infrared ray or semiconductor, or it is any more than combination.Meter The more specifically example of calculation machine readable storage medium storing program for executing can include but is not limited to:Electrical connection with one or more wires, just Take formula computer disk, hard disk, random access storage device (RAM), read-only storage (ROM), erasable type and may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only storage (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the present invention, computer-readable recording medium can any include or store journey The tangible medium of sequence, the program can be commanded execution system, device or device and use or in connection.And at this In invention, computer-readable signal media can be included in a base band or as the data-signal of carrier wave part propagation, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but not limit In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium beyond storage medium is read, the computer-readable medium, which can send, propagates or transmit, to be used for Used by instruction execution system, device or device or program in connection.Included on computer-readable medium Program code can be transmitted with any appropriate medium, be included but is not limited to:Wirelessly, electric wire, optical cable, RF etc., or above-mentioned Any appropriate combination.
Flow chart and block diagram in accompanying drawing, it is illustrated that according to the system of various embodiments of the invention, method and computer journey Architectural framework in the cards, function and the operation of sequence product.At this point, each square frame in flow chart or block diagram can generation The part of one module of table, program segment or code, a part for above-mentioned module, program segment or code is comprising one or more Executable instruction for realizing defined logic function.It should also be noted that in some realizations as replacement, institute in square frame The function of mark can also be with different from the order marked in accompanying drawing generation.For example, two square frames succeedingly represented are actual On can perform substantially in parallel, they can also be performed in the opposite order sometimes, and this is depending on involved function.Also It is noted that the combination of each square frame in block diagram or flow chart and the square frame in block diagram or flow chart, can use and perform rule Fixed function or the special hardware based system of operation realize, or can use the group of specialized hardware and computer instruction Close to realize.
Being described in module involved in the embodiment of the present invention can be realized by way of software, can also be by hard The mode of part is realized.Described module can also be set within a processor, for example, can be described as:A kind of processor bag Include data preprocessing module, Logic Regression Models module and characteristic coefficient determining module.Wherein, the title of these units is at certain In the case of do not constitute restriction to the unit in itself, for example, data preprocessing module is also described as " obtaining number of tags According to service feature data, the module that the service feature data are normalized ".
As on the other hand, present invention also offers a kind of computer-readable medium, the computer-readable medium can be Included in equipment described in above-described embodiment;Can also be individualism, and without be incorporated the equipment in.Above-mentioned calculating Machine computer-readable recording medium carries one or more program, when said one or multiple programs are performed by the equipment, makes Obtaining the equipment includes:Label data and service feature data are obtained, the service feature data are normalized;It is determined that Logic Regression Models, by the service feature data input after the label data and the normalized to the logistic regression Model;The characteristic coefficient of the Logic Regression Models is determined by iteration optimization algorithms.
Technique according to the invention scheme, the present invention combines Logic Regression Models on large data sets and iteration optimization is calculated Method, can utilize more valid data, quickly and accurately calculate characteristic coefficient for each service feature;By in Spark Call L-BFGS algorithms calling program is operated on large data sets in platform, meet becoming for current era big data machine learning Gesture, Optimized Iterative algorithm is more healthy and stronger reliable, and the Optimized Iterative convergence rate of characteristic coefficient is faster;By by Logic Regression Models Subscribing into offline task realizes the automation renewal of characteristic coefficient, had both liberated the part work of Data Analyst, and had saved Resource, can timely and accurately find the personalization preferences of user again, and correspondence is automatically updated according to the newest Shopping Behaviors of user Characteristic coefficient, improve Consumer's Experience, the personalized recommendation system more intelligent to electric business platform construction is significant.
The said goods can perform the method that the embodiment of the present invention is provided, and possesses the corresponding functional module of execution method and has Beneficial effect.Not ins and outs of detailed description in the present embodiment, reference can be made to the method that the embodiment of the present invention is provided.
Above-mentioned embodiment, does not constitute limiting the scope of the invention.Those skilled in the art should be bright It is white, depending on design requirement and other factors, can occur various modifications, combination, sub-portfolio and replacement.It is any Modifications, equivalent substitutions and improvements made within the spirit and principles in the present invention etc., should be included in the scope of the present invention Within.

Claims (20)

1. the characteristic coefficient training method that a kind of logic-based is returned, it is characterised in that including:
Label data and service feature data are obtained, the service feature data are normalized;
Logic Regression Models are determined, by the service feature data input after the label data and the normalized described in Logic Regression Models;
The characteristic coefficient of the Logic Regression Models is determined by iteration optimization algorithms.
2. according to the method described in claim 1, it is characterised in that the iteration optimization algorithms are L-BFGS algorithms or random Gradient method.
3. according to the method described in claim 1, it is characterised in that bag is normalized to the service feature data Include:The overdue data of the service feature data are rejected, place is normalized to rejecting the service feature data after overdue data Reason.
4. according to the method described in claim 1, it is characterised in that by the industry after the label data and the normalized Business characteristic is input to the Logic Regression Models according to preset format, and the form includes:Label and at least one set of business Characteristic ID and the corresponding characteristic values of service feature ID, the characteristic value are the service feature data after normalized.
5. according to any described methods of claim 1-4, it is characterised in that methods described also includes:To the logistic regression Model carries out regularization constraint.
6. according to any described methods of claim 1-4, it is characterised in that methods described also includes:The logic is set to return Return the evaluation index of model, one group of characteristic coefficient is selected by way of cross validation as the final of the Logic Regression Models Characteristic coefficient.
7. method according to claim 6, it is characterised in that one group of characteristic coefficient work is selected by way of cross validation Include for the optimal characteristics coefficient of the Logic Regression Models:Business after the label data and the normalized is special Levy data be grouped obtaining multigroup checking subset data, every group of checking subset data is separately input to the logistic regression mould Type obtains multigroup characteristic coefficient and prediction scoring probability, and every group of checking subset is calculated according to prediction scoring probability and label data The evaluation index of data, asks for the average value of all packet evaluation indexes to obtain the assessment performance of the Logic Regression Models, Final characteristic coefficient is selected according to assessment performance;The evaluation index includes accuracy rate, recall ratio, precision ratio, F values and AUC.
8. according to any described methods of claim 1-4, it is characterised in that the acquisition label data and service feature number According to the service feature data, which are normalized after step, also to be included:To the label data and the normalization Service feature data after processing remove noise data by way of the movement of lack sampling, over-sampling or threshold value.
9. according to any described methods of claim 1-4, it is characterised in that methods described also includes:Iteration optimization will be passed through Algorithm determines that the Logic Regression Models after characteristic coefficient subscribe into offline task to automate renewal characteristic coefficient.
10. the characteristic coefficient trainer that a kind of logic-based is returned, it is characterised in that including:
The service feature data, for obtaining label data and service feature data, are carried out normalizing by data preprocessing module Change is handled;
Logic Regression Models module, for determining Logic Regression Models, after the label data and the normalized Service feature data input is to the Logic Regression Models;
Characteristic coefficient determining module, the characteristic coefficient for determining the Logic Regression Models by iteration optimization algorithms.
11. device according to claim 10, it is characterised in that the iteration optimization algorithms be L-BFGS algorithms or with Machine gradient method.
12. device according to claim 10, it is characterised in that the data preprocessing module is additionally operable to:Reject described The overdue data of service feature data, are normalized to rejecting the service feature data after overdue data.
13. device according to claim 10, it is characterised in that the Logic Regression Models module is additionally operable to:Will be described Service feature data after label data and the normalized are input to the Logic Regression Models, institute according to preset format Stating form includes:Label characteristic value corresponding with least one set service feature ID and service feature ID, the characteristic value is to return Service feature data after one change processing.
14. according to any described devices of claim 10-13, it is characterised in that described device also includes:Regularization constraint mould Block, for carrying out regularization constraint to the Logic Regression Models.
15. according to any described devices of claim 10-13, it is characterised in that described device also includes:Characteristic coefficient is selected Module, the evaluation index for setting the Logic Regression Models selects one group of characteristic coefficient work by way of cross validation For the final characteristic coefficient of the Logic Regression Models.
16. device according to claim 15, it is characterised in that the characteristic coefficient selecting module is additionally operable to:Will be described Service feature data after label data and the normalized be grouped obtaining multigroup checking subset data, and every group is tested Card subset data is separately input to the Logic Regression Models and obtains multigroup characteristic coefficient and prediction scoring probability, according to prediction Scoring probability and label data calculate the evaluation index of every group of checking subset data, ask for the average value of all packet evaluation indexes To obtain the assessment performance of the Logic Regression Models, final characteristic coefficient is selected according to assessment performance;The evaluation index bag Include accuracy rate, recall ratio, precision ratio, F values and AUC.
17. according to any described devices of claim 10-13, it is characterised in that described device also includes:Noise data is removed Module, for the service feature data after the label data and the normalized by lack sampling, over-sampling or The mode of threshold value movement removes noise data.
18. according to any described devices of claim 10-13, it is characterised in that described device also includes:Coefficient of automation is more New module, for will determine that the Logic Regression Models after characteristic coefficient subscribe into offline task with automatic by iteration optimization algorithms Change and update characteristic coefficient.
19. a kind of electronic equipment, it is characterised in that including:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are by one or more of computing devices so that one or more of processors are real The existing method as described in any in claim 1-9.
20. a kind of computer-readable medium, is stored thereon with computer program, it is characterised in that described program is held by processor The method as described in any in claim 1-9 is realized during row.
CN201710398250.9A 2017-05-31 2017-05-31 Characteristic coefficient training method and device that logic-based is returned Pending CN107220217A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710398250.9A CN107220217A (en) 2017-05-31 2017-05-31 Characteristic coefficient training method and device that logic-based is returned

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710398250.9A CN107220217A (en) 2017-05-31 2017-05-31 Characteristic coefficient training method and device that logic-based is returned

Publications (1)

Publication Number Publication Date
CN107220217A true CN107220217A (en) 2017-09-29

Family

ID=59946924

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710398250.9A Pending CN107220217A (en) 2017-05-31 2017-05-31 Characteristic coefficient training method and device that logic-based is returned

Country Status (1)

Country Link
CN (1) CN107220217A (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108912A (en) * 2018-01-10 2018-06-01 百度在线网络技术(北京)有限公司 Method of discrimination, device, server and the storage medium of interactive low quality user
CN108932530A (en) * 2018-06-29 2018-12-04 新华三大数据技术有限公司 The construction method and device of label system
CN110019163A (en) * 2017-12-05 2019-07-16 北京京东尚科信息技术有限公司 Method, system, equipment and the storage medium of prediction, the recommendation of characteristics of objects
CN110428270A (en) * 2019-08-07 2019-11-08 佰聆数据股份有限公司 The potential preference client recognition methods of the channel of logic-based regression algorithm
CN110709863A (en) * 2019-01-11 2020-01-17 阿里巴巴集团控股有限公司 Logistic regression modeling scheme using secret sharing
CN110706822A (en) * 2019-09-20 2020-01-17 上海派拉软件股份有限公司 Health management method based on logistic regression model and decision tree model
CN110888857A (en) * 2019-10-14 2020-03-17 平安科技(深圳)有限公司 Data label generation method, device, terminal and medium based on neural network
CN111008321A (en) * 2019-11-18 2020-04-14 广东技术师范大学 Recommendation method and device based on logistic regression, computing equipment and readable storage medium
CN111191789A (en) * 2020-01-20 2020-05-22 上海依图网络科技有限公司 Model training method, system, chip, electronic device and medium
CN111325280A (en) * 2020-02-27 2020-06-23 苏宁云计算有限公司 Label generation method and system
CN111405583A (en) * 2019-01-02 2020-07-10 中国移动通信有限公司研究院 Tidal effect avoiding method and device and computer readable storage medium
CN111753386A (en) * 2019-03-11 2020-10-09 北京嘀嘀无限科技发展有限公司 Data processing method and device
CN111914995A (en) * 2020-06-18 2020-11-10 北京百度网讯科技有限公司 Regularized linear regression generation method and device, electronic equipment and storage medium
CN111966473A (en) * 2020-07-24 2020-11-20 支付宝(杭州)信息技术有限公司 Operation method and device of linear regression task and electronic equipment
CN112508044A (en) * 2019-09-16 2021-03-16 华为技术有限公司 Artificial intelligence AI model evaluation method, system and equipment
CN112785256A (en) * 2021-01-14 2021-05-11 田进伟 Real-time assessment method and system for clinical endpoint events in clinical trials
CN112835910A (en) * 2021-03-05 2021-05-25 天九共享网络科技集团有限公司 Enterprise information and policy information processing method and device
CN113380417A (en) * 2021-06-17 2021-09-10 哈尔滨理工大学 LR-N based cardiovascular disease prediction method
CN113535444A (en) * 2020-04-14 2021-10-22 中国移动通信集团浙江有限公司 Transaction detection method, transaction detection device, computing equipment and computer storage medium
CN113568739A (en) * 2021-07-12 2021-10-29 北京淇瑀信息科技有限公司 User resource limit distribution method and device and electronic equipment
CN113781134A (en) * 2020-07-28 2021-12-10 北京沃东天骏信息技术有限公司 Item recommendation method and device and computer-readable storage medium
CN113988374A (en) * 2021-09-27 2022-01-28 上海东普信息科技有限公司 Method, device, equipment and storage medium for identifying high-quality user through targeted recommendation
CN114783007A (en) * 2022-06-22 2022-07-22 成都新希望金融信息有限公司 Equipment fingerprint identification method and device and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853470A (en) * 2010-05-28 2010-10-06 浙江大学 A Collaborative Filtering Method Based on Social Tags
WO2012094516A1 (en) * 2011-01-06 2012-07-12 Ebay Inc. Interestingness recommendations in a computing advice facility
WO2012103290A1 (en) * 2011-01-26 2012-08-02 Google Inc. Dynamic predictive modeling platform
US20140379519A1 (en) * 2013-06-25 2014-12-25 Texas Instruments Incorporated E-commerce cross-sampling product recommender based on statistics
CN105045819A (en) * 2015-06-26 2015-11-11 深圳市腾讯计算机系统有限公司 Model training method and device for training data
CN105469263A (en) * 2014-09-24 2016-04-06 阿里巴巴集团控股有限公司 Commodity recommendation method and device
CN106022865A (en) * 2016-05-10 2016-10-12 江苏大学 Goods recommendation method based on scores and user behaviors

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853470A (en) * 2010-05-28 2010-10-06 浙江大学 A Collaborative Filtering Method Based on Social Tags
WO2012094516A1 (en) * 2011-01-06 2012-07-12 Ebay Inc. Interestingness recommendations in a computing advice facility
WO2012103290A1 (en) * 2011-01-26 2012-08-02 Google Inc. Dynamic predictive modeling platform
US20140379519A1 (en) * 2013-06-25 2014-12-25 Texas Instruments Incorporated E-commerce cross-sampling product recommender based on statistics
CN105469263A (en) * 2014-09-24 2016-04-06 阿里巴巴集团控股有限公司 Commodity recommendation method and device
CN105045819A (en) * 2015-06-26 2015-11-11 深圳市腾讯计算机系统有限公司 Model training method and device for training data
CN106022865A (en) * 2016-05-10 2016-10-12 江苏大学 Goods recommendation method based on scores and user behaviors

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
佚名: "机器学习之模型选择(交叉验证)", 《HTTPS://BLOG.CSDN.NET/U014365862/ARTICLE/DETAILS/47903819》 *
佚名: "逻辑回归 - 理论篇", 《HTTPS://BLOG.CSDN.NET/PAKKO/ARTICLE/DETAILS/37878837?DEPTH_1-UTM_SOURCE=DISTRIBUTE.PC_RELEVANT.NONE-TASK-BLOG-BLOGCOMMENDFROMBAIDU-3&UT…》 *
周越: "数据嗨客|第6期:不平衡数据处理", 《HTTPS://MP.WEIXIN.QQ.COM/S?__BIZ=MZAWMZIXMJIYMG==&MID=2651005812&IDX=1&SN=B9819F04CB2EE9AF21F4011D34013824&SCENE=0》 *
涂新辉: "《基于概念的信息检索方法》", 30 April 2015, 华中师范大学出版社 *
董学辉: "逻辑回归算法及其GPU并行实现研究", 《中国优秀硕士学位论文全文数据库-信息科技辑》 *
陈峰等: "《医用多元统计分析方法》", 31 December 2000, 中国统计出版社 *

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019163A (en) * 2017-12-05 2019-07-16 北京京东尚科信息技术有限公司 Method, system, equipment and the storage medium of prediction, the recommendation of characteristics of objects
CN108108912A (en) * 2018-01-10 2018-06-01 百度在线网络技术(北京)有限公司 Method of discrimination, device, server and the storage medium of interactive low quality user
CN108932530A (en) * 2018-06-29 2018-12-04 新华三大数据技术有限公司 The construction method and device of label system
CN111405583A (en) * 2019-01-02 2020-07-10 中国移动通信有限公司研究院 Tidal effect avoiding method and device and computer readable storage medium
CN110709863A (en) * 2019-01-11 2020-01-17 阿里巴巴集团控股有限公司 Logistic regression modeling scheme using secret sharing
CN110709863B (en) * 2019-01-11 2024-02-06 创新先进技术有限公司 Logistic regression modeling method, storage medium, and system using secret sharing
CN111753386B (en) * 2019-03-11 2024-03-26 北京嘀嘀无限科技发展有限公司 Data processing method and device
CN111753386A (en) * 2019-03-11 2020-10-09 北京嘀嘀无限科技发展有限公司 Data processing method and device
CN110428270A (en) * 2019-08-07 2019-11-08 佰聆数据股份有限公司 The potential preference client recognition methods of the channel of logic-based regression algorithm
CN112508044A (en) * 2019-09-16 2021-03-16 华为技术有限公司 Artificial intelligence AI model evaluation method, system and equipment
CN110706822A (en) * 2019-09-20 2020-01-17 上海派拉软件股份有限公司 Health management method based on logistic regression model and decision tree model
CN110706822B (en) * 2019-09-20 2024-02-02 上海派拉软件股份有限公司 Health management method based on logistic regression model and decision tree model
CN110888857A (en) * 2019-10-14 2020-03-17 平安科技(深圳)有限公司 Data label generation method, device, terminal and medium based on neural network
WO2021073152A1 (en) * 2019-10-14 2021-04-22 平安科技(深圳)有限公司 Data label generation method and apparatus based on neural network, and terminal and medium
CN110888857B (en) * 2019-10-14 2023-11-07 平安科技(深圳)有限公司 Data tag generation method, device, terminal and medium based on neural network
CN111008321A (en) * 2019-11-18 2020-04-14 广东技术师范大学 Recommendation method and device based on logistic regression, computing equipment and readable storage medium
CN111008321B (en) * 2019-11-18 2023-08-29 广东技术师范大学 Recommendation method, device, computing device, and readable storage medium based on logistic regression
CN111191789A (en) * 2020-01-20 2020-05-22 上海依图网络科技有限公司 Model training method, system, chip, electronic device and medium
CN111191789B (en) * 2020-01-20 2023-11-28 上海依图网络科技有限公司 Model optimization deployment system, chip, electronic equipment and medium
CN111325280A (en) * 2020-02-27 2020-06-23 苏宁云计算有限公司 Label generation method and system
CN113535444B (en) * 2020-04-14 2023-11-03 中国移动通信集团浙江有限公司 Abnormal motion detection method, device, computing equipment and computer storage medium
CN113535444A (en) * 2020-04-14 2021-10-22 中国移动通信集团浙江有限公司 Transaction detection method, transaction detection device, computing equipment and computer storage medium
CN111914995A (en) * 2020-06-18 2020-11-10 北京百度网讯科技有限公司 Regularized linear regression generation method and device, electronic equipment and storage medium
CN111966473B (en) * 2020-07-24 2024-02-06 支付宝(杭州)信息技术有限公司 Operation method and device of linear regression task and electronic equipment
CN111966473A (en) * 2020-07-24 2020-11-20 支付宝(杭州)信息技术有限公司 Operation method and device of linear regression task and electronic equipment
CN113781134A (en) * 2020-07-28 2021-12-10 北京沃东天骏信息技术有限公司 Item recommendation method and device and computer-readable storage medium
CN112785256A (en) * 2021-01-14 2021-05-11 田进伟 Real-time assessment method and system for clinical endpoint events in clinical trials
CN112835910B (en) * 2021-03-05 2023-10-17 天九共享网络科技集团有限公司 Method and device for processing enterprise information and policy information
CN112835910A (en) * 2021-03-05 2021-05-25 天九共享网络科技集团有限公司 Enterprise information and policy information processing method and device
CN113380417A (en) * 2021-06-17 2021-09-10 哈尔滨理工大学 LR-N based cardiovascular disease prediction method
CN113568739A (en) * 2021-07-12 2021-10-29 北京淇瑀信息科技有限公司 User resource limit distribution method and device and electronic equipment
CN113988374A (en) * 2021-09-27 2022-01-28 上海东普信息科技有限公司 Method, device, equipment and storage medium for identifying high-quality user through targeted recommendation
CN114783007A (en) * 2022-06-22 2022-07-22 成都新希望金融信息有限公司 Equipment fingerprint identification method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN107220217A (en) Characteristic coefficient training method and device that logic-based is returned
CN110995459B (en) Abnormal object identification method, device, medium and electronic equipment
CN109919684A (en) For generating method, electronic equipment and the computer readable storage medium of information prediction model
CN111080338B (en) User data processing method and device, electronic equipment and storage medium
CN106095942B (en) Strong variable extracting method and device
CN107679946A (en) Fund Products Show method, apparatus, terminal device and storage medium
US12079748B2 (en) Co-operative resource pooling system
CN113344700B (en) Multi-objective optimization-based wind control model construction method and device and electronic equipment
CN114240555A (en) Method and device for training click-through rate prediction model and predicting click-through rate
CN113674087A (en) Enterprise credit rating method, apparatus, electronic device and medium
CN115423040A (en) User portrait recognition method and AI system for interactive marketing platform
US11810022B2 (en) Contact center call volume prediction
CN112860672A (en) Method and device for determining label weight
CN109800138B (en) CPU testing method, electronic device and storage medium
CN114707733A (en) Risk indicator prediction method and device, electronic equipment and storage medium
CN113496236B (en) User tag information determining method, device, equipment and storage medium
CN113743906A (en) Method and device for determining service processing strategy
US20240211973A1 (en) Technology stack modeler engine for a platform signal modeler
CN110796381B (en) Modeling method and device for wind control model, terminal equipment and medium
CN117874594A (en) Bank customer classification method, system and electronic equipment
CN116681467A (en) Sales predicting method based on improved WOA algorithm and related equipment thereof
KR102284440B1 (en) Method to broker deep learning model transactions perfomed by deep learning model transaction brokerage servers
CN110956528B (en) Recommendation method and system for e-commerce platform
CN110414709A (en) Debt risk intelligent Forecasting, device and computer readable storage medium
CN118656685B (en) A derivative feature extraction method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170929

RJ01 Rejection of invention patent application after publication