CN110555717A - method for mining potential purchased goods and categories of users based on user behavior characteristics - Google Patents
method for mining potential purchased goods and categories of users based on user behavior characteristics Download PDFInfo
- Publication number
- CN110555717A CN110555717A CN201910687675.0A CN201910687675A CN110555717A CN 110555717 A CN110555717 A CN 110555717A CN 201910687675 A CN201910687675 A CN 201910687675A CN 110555717 A CN110555717 A CN 110555717A
- Authority
- CN
- China
- Prior art keywords
- data
- user
- train
- categories
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0202—Market predictions or forecasting for commercial activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Item recommendations
Landscapes
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Marketing (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- Data Mining & Analysis (AREA)
- Game Theory and Decision Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the field of user behavior analysis and data mining, and relates to a method for mining potential purchased commodities and categories of users based on user behavior characteristics, which is used for carrying out data coding on preprocessed data and carrying out characteristic engineering processing to obtain user behavior characteristic data; carrying out positive and negative sample analysis and classification on sample data, and generating a plurality of sample subsets by carrying out dynamic undersampling on the positive and negative samples to be used as positive and negative sample data of training; training the decision tree model through positive and negative sample data to train a plurality of single prediction models, and fusing the single prediction models through a stacking mode to generate a plurality of fused prediction models; and predicting the potential purchased commodities and categories of the user based on the plurality of fusion prediction models, and processing and analyzing the prediction results of the fusion prediction models to obtain the potential purchased commodities and categories of the user with weights. The invention can help the commercial tenant to discover the users with high potential purchasing intention and improve the consumption conversion rate of the marketing users.
Description
Technical Field
The invention belongs to the field of user behavior analysis and data mining, and relates to a method for mining potential purchased commodities and categories of users based on user behavior characteristics.
background
today, the electronic commerce is rapidly developed, active marketing can make merchants stand out in the market where the commodities are homogenized and spread, attract users and practically improve the consumption conversion rate of the marketing users. How to actively market is achieved through advertisement propaganda and media propagation, but the traditional modes are combined into a flow, and the conversion rate of the users is basically obtained through combination, so that a more effective method is needed to improve the consumption conversion rate of the users, the key point is how to accurately obtain target users and push commodity information which is most likely to be purchased to the target users, and how to obtain the target users and target commodities relates to the problems of mining and prediction.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method for mining potential target commodity classes purchased by a user based on user behavior characteristics.
The invention is realized by adopting the following technical scheme:
The method for mining the potential purchased goods and categories of the user based on the user behavior characteristics comprises the following steps:
Data cleaning and data preprocessing are carried out, and preprocessed data are obtained;
Carrying out data coding on the preprocessed data, extracting basic features, statistical features, time interval features and calculation features, carrying out feature importance evaluation on the extracted features through a filtering method, screening out important features and redundant features, introducing a time regression theory in the feature importance evaluation process, aiming at the fact that the closer some user behaviors are to the prediction time, the larger the influence on the result is, carrying out weighting processing on the data features, and obtaining user behavior feature data;
Carrying out positive and negative sample analysis and classification on sample data, and generating a plurality of sample subsets by carrying out dynamic undersampling on the positive and negative samples to be used as positive and negative sample data of training;
training the decision tree model through positive and negative sample data to train a plurality of single prediction models, and fusing the single prediction models through a stacking mode to generate a plurality of fused prediction models;
Predicting by using a fusion prediction model, comparing a prediction result with an expected value, and feeding back to the decision tree model for parameter adjustment and model retraining until optimal model parameters are obtained;
And predicting the potential purchased commodities and categories of the user based on the plurality of fusion prediction models, and processing and analyzing the prediction results of the fusion prediction models to obtain the potential purchased commodities and categories of the user with weights.
further, the filtering method is to score each feature according to an index by using a correlation coefficient method, wherein the score represents the importance of the feature, and then rank the features according to the scores.
preferably, chi-squared filtering is used to calculate the chi-squared statistic between each non-negative feature and the label, and the features are ranked according to chi-squared statistic from high to low, and the class of the top K features with the highest score is selected, thereby removing features that are most likely independent of label and unrelated to classification purposes.
preferably, the sample data includes behavioral characteristic data of a certain user for a certain commodity for a certain period of time, the commodity, characteristic data of the user himself.
further, the dynamic undersampling process is to extract a part of samples with an excessive number of samples by a certain method so as to coordinate the proportion imbalance between the part of samples and other samples.
preferably, a certain number of subsamples are extracted from the negative samples by random extraction, and combined with the positive samples to form a new sample set.
Preferably, the decision tree models employed are the RF and GDBT algorithms.
preferably, the process of generating the fusion prediction model by the plurality of single prediction models in a stacking manner includes:
after the positive and negative samples are processed, n sample training sets train _ x, …, train _ y and a test set test are generated;
firstly, selecting an untrained decision tree model;
Extracting n-1 parts of the training set as small training sets s _ train _ x, … and s _ train _ y, and the other part of the training set as a small test set s _ test, wherein the test set test is unchanged;
Thirdly, training a decision tree model by s _ train _ x, … and s _ train _ y, predicting s _ test by the trained model to obtain corresponding s _ pred, and predicting test to obtain y _ pred;
Selecting another part in the training set as a small test set s _ test _ x, and taking the other n-1 parts as the training set to train the decision tree model;
Repeating the steps of (a), (b) and (c) for n times to obtain n s _ preds and n y _ preds;
n s _ preds are used as a train _ X, the original train _ Y is used as a train _ Y to train a fusion prediction model to obtain a model G, the average value of the n Y _ preds is used as a new test _ X, and the test _ X is brought into the model G to obtain a prediction result;
In the second layer, a layer of stacking is performed by combining the output training set train _ X, train _ Y and the test set test _ X of the first layer with other feature sets, and the steps are repeated to generate a final fusion prediction model.
Preferably, the desired value is set using F1-score.
preferably, the input of the fusion prediction model comprises the behavior characteristics, commodities and categories of the user for a certain commodity in a fixed time before the prediction date, and the prediction output is whether the user will purchase the commodity.
compared with the prior art, the invention has the following beneficial effects:
(1) for the extracted data features, feature importance evaluation is carried out through a filtering method (each feature is scored according to an index by adopting a correlation coefficient method, the score represents the importance of the feature, then the features are sorted according to the score), important features and redundant features are screened out, in the feature importance evaluation process, aiming at the fact that the closer some user behaviors are to the prediction time, the larger the influence on the result is, a time regression theory is introduced (for some behaviors, such as click behaviors, if the weight is lower the farther away from the prediction day), weighting processing is carried out on the data features, and feature importance evaluation is more accurate.
(2) the method mainly includes that a certain number of sub-samples are extracted from negative samples in a random extraction mode, a new sample set is combined with positive samples to be used for learning training, a plurality of training subsets are generated by repeating the generation process of the new sample subset, each subset is independently trained, and then model fusion is carried out to carry out final prediction model synthesis.
(3) The method comprises the steps of training and predicting by utilizing various classifier models (decision tree models), fusing single decision tree models in a stacking mode to generate a plurality of fused prediction models, weighting and weight reducing processing are carried out by combining results of the plurality of fused prediction models, the weight reference of prediction results is enhanced, and user commodity recommendation is better carried out.
(4) dynamic diversity is carried out on a large amount of unbalanced user behavior data, and the failure of a classifier caused by unbalanced samples is prevented; and performing multi-model training by using the divided sample subsets, thereby improving the utilization rate and the training effect of data.
(5) The prediction model can improve the prediction precision of future purchasing behaviors of the user, help the merchant to discover the user with high potential purchasing intention, guide the merchant to push more accurate commodity information and preferential information to the user, and improve the user consumption conversion rate of marketing.
Drawings
fig. 1 is a flowchart illustrating a method for mining a category of a target commodity potentially purchased by a user based on user behavior characteristics according to an embodiment of the present invention.
FIG. 2 is a block diagram of user behavior characterization in accordance with an embodiment of the present invention;
FIG. 3 is a diagram illustrating a model training and parameter tuning process according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail below with reference to specific embodiments, but the embodiments of the present invention are not limited thereto.
At present, e-commerce is deeply involved in the life of each person, the proportion of behaviors of users in a network environment is increased, the clicking, browsing, purchasing, commenting and other behaviors of the users on the internet are seemingly irrelevant, very important information is hidden, user behavior feature extraction is carried out on the characteristics of user purchasing behaviors through a feature engineering theory, the behaviors of the users are understood in a machine learning mode, the future purchasing intention and target commodities of the users are predicted by utilizing historical behavior data of the users, and according to the prediction result, merchants can carry out more accurate commodity marketing and preferential pushing on the target users, the popularization user consumption conversion rate is improved, the minimum investment is achieved, and the maximum effect is obtained.
According to the method, the user behavior characteristics are extracted according to the business characteristics and the characteristic engineering method, a user behavior model is constructed by combining the characteristics of commodity data, and finally, the characteristic data are dynamically divided according to the balance of the user behavior characteristic data to generate a plurality of data sets for model training and prediction. The training and prediction of data are processed by adopting various machine learning classifier models, after the result of each model is predicted, the prediction results are fused, meanwhile, the results appearing in a plurality of prediction models are weighted, and the weight of the result appearing in a single prediction model is reduced.
The method for mining the potential purchase target commodity class of the user based on the user behavior characteristics, as shown in fig. 1, includes:
And S1, cleaning and preprocessing the data to obtain preprocessed data.
And carrying out abnormal data cleaning (including null value and abnormal value processing and abnormal user behavior data cleaning) on the collected user data and commodity data, and carrying out standardized processing according to specified data requirements.
the user data comprises user basic attribute data, user behavior data and user comment data. In this embodiment, the basic attribute data of the user includes gender, age, and registration time; the user behavior data comprises operation time, operation types, objects, products and product types; the user comment data comprise the number of user comments, the number of good comments, the number of bad comments and the time of final comment.
the commodity data includes commodity code, name, category code, category, and commodity attribute.
And carrying out abnormal value processing on the user data and the commodity data through the variance.
Data preprocessing as shown in fig. 2, includes: data format standardization, null value and illegal value statistics, and consistency detection of user, product and behavior data.
S2, carrying out data coding on the preprocessed data, extracting basic features, statistical features, time interval features and calculation features, carrying out feature importance evaluation on the extracted features through a filtering method, screening out important features and redundant features, introducing a time regression theory aiming at the fact that the closer some user behaviors are to the prediction time, the larger the effect on the result is, and carrying out weighting processing on the data features to obtain user behavior feature data.
as shown in fig. 2, step S2 may be understood as a feature engineering, including: the preprocessed data are coded, so that the data are conveniently used for decision tree training; extracting basic features of users and basic features of commodities; counting basic user behavior data through different dimensions, wherein the basic user behavior data mainly comprise time dimensions, commodity class dimensions, behavior type dimensions, user attribute dimensions and commodity attribute dimensions, and generating statistic class features, time interval class features and calculation class characteristics; performing characteristic association and fusion through the relationship among users, commodities and behaviors; and classifying the behaviors, and weighting the behaviors such as clicking, collecting and the like by using a time regression theory, wherein the behavior characteristic weight is lower when the behavior is farther away from the prediction time.
The filtering method of the invention is to score each feature according to an index by adopting a correlation coefficient method, wherein the score represents the importance of the feature, and then sort the features according to the scores. In this embodiment, chi-square filtering is used to calculate the chi-square statistic between each non-negative feature and the label, and rank the features according to the chi-square statistic from high to low, and select the top K classes of features with the highest scores, thereby removing features that are most likely to be independent of the label and unrelated to the classification purpose.
The invention introduces a time decay theory and carries out weighting processing on data characteristics, and the method comprises the following steps: weighting certain behaviors, such as click behaviors, makes the feature importance assessment more accurate if the weights are lower the farther away from the predicted day.
In this embodiment, the extraction process of the basic feature, the statistical feature, the time interval feature, and the calculation feature includes:
One-hot coding, e.g. type, is performed on the basic features. One-hot encoding, also known as one-bit-efficient encoding, uses an N-bit register to represent N states, each state having its own dedicated register bit and always remaining one bit active. The advantage of the one-hot coding is that the design is convenient and the realization is easy; in addition, it can also encode non-contiguous features without the need for a decoding operation.
Secondly, counting various user behaviors in different periods, such as the number of clicks, the number of collections, repeated behavior statistics and the like within 7 days.
And thirdly, extracting interval type features of certain behaviors, such as the interval between the last browsing and the previous browsing of the commodity by the user.
And fourthly, calculating the characteristics such as purchase conversion rate after the user adds the shopping cart within one week, purchase conversion rate of the browsing number of the user, the added number of the user and the concerned number of the user, time decay weighting of user behavior statistics and the like.
And S3, carrying out positive and negative sample analysis and classification on the sample data, and generating a plurality of sample subsets by carrying out dynamic undersampling on the positive and negative samples to be used as the positive and negative sample data of training.
The sample data of the invention comprises behavior characteristic data of a certain user for a certain commodity in a period of time, the commodity, characteristic data of the user and the like.
The sample data is marked by combining actual purchase data of a user, user characteristic data and the like to generate positive and negative sample data, the unbalance condition of the positive and negative samples is automatically analyzed, the sample data is subjected to proper undersampling processing according to the preset positive and negative proportion, and a plurality of positive and negative sample subsets are generated. If the user has purchased the item within the forecast date, the sample data is a positive sample, otherwise it is a negative sample.
aiming at the unbalance of positive and negative samples possibly existing in sample data, if the data are directly used, the training result has bias, so that the invention generates a plurality of sample subsets by the positive and negative samples through dynamic undersampling processing, and each subset is independently trained. The dynamic undersampling processing of the invention is to extract a part of samples with overlarge number of samples by a certain method so as to coordinate the unbalanced proportion between the samples and other samples. In this embodiment, a certain number of subsamples are extracted from the negative samples in a random extraction manner, and are combined with the positive samples to form a new sample set for learning training, and a plurality of classifiers are trained by repeating the generation process of the new sample subset.
And S4, training the decision tree model through positive and negative sample data to train a plurality of single prediction models, and fusing the single prediction models through a stacking mode to generate a plurality of fused prediction models.
In this embodiment, based on the purpose of prediction, the two-class problem may be abstracted, the adopted decision tree model is an RF and GDBT algorithm to train the sample data set, generate a plurality of prediction models, and fuse the models in a stacking manner to generate a plurality of fusion prediction models.
After the decision tree model is stable, LR is considered to be used for weighting the prediction result, and the accuracy of the fusion model is enhanced.
Generating a process description of the fusion prediction model by the plurality of single prediction models in a stacking mode:
after the positive and negative samples are processed, n sample training sets train _ x, …, train _ y, and test set test are generated.
Firstly, a decision tree model is selected, such as random forest RF or gradient boosting decision tree GDBT. (untrained)
And secondly, extracting n-1 parts of the training set as small training sets s _ train _ x, … and s _ train _ y, and extracting the other part of the training set as a small test set s _ test, wherein the test set test is unchanged.
and thirdly, training an RF model or a GDTB model by s _ train _ x, … and s _ train _ y, predicting s _ test by the trained model to obtain corresponding s _ pred, and predicting test to obtain y _ pred.
And selecting another part in the training set as a small test set s _ test _ x, and using the other n-1 parts as the training set to train the model RF or GDTB model.
And fifthly, repeating the steps of the third step, the fourth step for n times to obtain n s _ preds and n y _ preds.
n s _ preds are used as a train _ X, the original train _ Y is used as a train _ Y to train a fusion prediction model to obtain a model G, the average value of the n Y _ preds is used as a new test _ X, and the test _ X is brought into the model G to obtain a prediction result.
In the second layer, a layer of stacking is performed by combining the output training set train _ X, train _ Y and the test set test _ X of the first layer with other feature sets, and the steps are repeated to generate a final fusion prediction model.
and S5, predicting the result by using the prediction model, comparing the result with an expected value, and feeding back to the decision tree model for parameter adjustment and model retraining until the optimal model parameters are obtained.
And predicting historical data by using a prediction model, comparing historical results, judging results according to preset coverage, recording the model if the historical data do not accord with the preset values, returning to the decision tree model to adjust parameters, and retraining and predicting until the coverage is higher than the preset value or the training times reach the preset value. In this embodiment, the model training and parameter adjusting process is shown in fig. 3.
The expected value is related to the model, and in this embodiment, the expected value is set by F1-score:
(1) precision (Precision): accuracy is the percentage of true classes among all identified "positive classes," which refers to the probability that the resulting positive sample is identified correctly. The calculation formula of the accuracy is as follows:
Wherein: TP is True Positive, determined to be a Positive sample, and in fact also a Positive sample; FP is False Positive and is judged to be a Positive sample, but is actually a negative sample.
(2) Recall (Recall): also called recall ratio, is the proportion of real classes in the test set to all positive classes. The recall ratio is calculated as follows:
wherein: FN is False Negative, and is judged as a Negative sample, but is in fact a positive sample.
(3) F-score: the accuracy and the Recall rate are often contradictory, the F-score considers the two values at the same time, the F-score is a harmonic average value of the Precision rate (Precision) and the Recall rate (Recall), the F1-socre is the condition that the beta is 1 in the general formula of the F-score, namely the Precision value and the Recall value have the same importance, and when the beta is more than 1, the proportion occupied by the Recall is larger. The general formula for F-score and the calculation formula for F1-socre are as follows:
In the formula, P represents accuracy, R represents recall, β is a weight for balancing accuracy and recall in F-score calculation, and the following three values are taken:
if 1 is taken, the accuracy rate is as important as the recall rate;
If the value is less than 1, the accuracy rate is more important than the recall rate;
If greater than 1 is taken, it indicates that recall is more important than accuracy.
And S6, predicting the potential purchased commodities and categories of the user based on the plurality of fusion prediction models, and processing and analyzing the prediction results of the fusion prediction models to obtain the potential purchased commodities and categories of the user with weights.
in this embodiment, the input data of the fusion prediction model includes behavior characteristics, commodities, categories, and the like of the user for a certain commodity in a fixed time before the prediction date, and predicts whether the output user will purchase the commodity.
And performing intersection and difference set operation on the prediction results of the fusion prediction models, wherein the recommendation weight of the intersection result is high, and the weight of the difference result is low. And if the predicted potential purchased commodities and categories of the user are predicted to be purchased by the user in a plurality of fusion prediction models, weighting the potential purchased commodities and categories of the user, and if the potential purchased commodities and categories of the user only appear in the output of a certain fusion prediction model, reducing the weight to finally obtain the prediction result of the potential purchased commodities and categories of the user with the weight.
The method of the present invention is described below by taking the example of mining the behavior characteristics of users of a certain website to find potential purchased goods and categories.
Firstly, collecting basic information of a user of a website, wherein the basic information comprises age, gender, registration time and the like; collecting user behavior data, including user online behavior data of clicking, browsing, purchasing, commenting, collecting, paying attention to, canceling attention to and the like; collecting commodity data: including name, category, commodity attributes, etc.; and converting the related data into a format required by the method of the invention, and cleaning abnormal data.
Secondly, encoding the basic data, and conveniently using the basic data for machine learning training; extracting basic features of users and basic features of commodities; counting basic user behavior data through different dimensions, wherein the basic user behavior data mainly comprise time dimensions, commodity class dimensions, behavior type dimensions, user attribute dimensions and commodity attribute dimensions, and generating statistic class features, time interval class features and calculation class characteristics; performing characteristic association and fusion through the relationship among users, commodities and behaviors; classifying the behaviors, weighting the behaviors such as clicking, collecting and the like by using a time regression theory, wherein the behavior characteristic weight which is farther away from the prediction time is lower;
and calculating chi-square statistic between each non-negative feature and the label by using chi-square filtering, and ranking the features according to the chi-square statistic from high to low, and selecting the class of the top K features with the highest score, thereby removing the features which are most possibly independent of the label and are irrelevant to the classification purpose.
and thirdly, marking the sample by combining actual purchase data of a user and user characteristic data to generate positive and negative sample data, automatically analyzing the unbalanced condition of the positive and negative samples, and performing proper undersampling processing on the sample data according to a preset positive and negative proportion to generate a plurality of positive and negative sample subsets.
and fourthly, training the sample data set through an RF (radio frequency) algorithm and a GDBT (generalized differential bit rate) algorithm to generate a plurality of prediction models, and fusing the models in a stacking mode to generate a plurality of fusion prediction models.
And fifthly, predicting historical data by using the fusion prediction model, comparing historical results, judging results according to preset coverage, recording the model if the results do not accord with the preset values, returning to the fourth step to adjust the model parameters, and retraining and predicting until the coverage is higher than the preset values or the training times reach the preset values.
And sixthly, intersecting results of the plurality of fusion prediction models, weighting the results if the results appear in the plurality of fusion prediction models, and reducing the weight if the results appear in only one fusion prediction model to obtain a prediction result with the weight, thereby better guiding the business development.
the invention discloses a method for mining potential target commodity categories purchased by a user based on user behavior characteristics, which comprises the steps of taking user behavior data of an electric business as an entry point, carrying out data acquisition according to characteristics of the electric business, carrying out standardization processing and user behavior characteristic extraction on the user data by combining a characteristic engineering theory, constructing a user behavior model, carrying out training analysis on the user behavior data through a machine learning algorithm, and outputting a prediction model capable of most 'understanding' the user behavior, wherein the prediction model is used for predicting the purchase intention of the user in a certain period in the future and the commodities and categories which are most likely to be purchased.
if the user behavior is to be understood and predicted effectively, the user behavior must be analyzed and refined effectively to obtain useful data and eliminate invalid noise, so as to obtain valuable data which can be used for user representation. The important data characteristics of the user are obtained, the data are subjected to correlation analysis, data modeling is conducted with the purpose of predicting the commodity purchasing intention and the classification intention of the user as guidance, the user behavior characteristic data are really converted into a user behavior model which can be used for prediction, and the correlation and analysis process can be realized by adopting a machine learning method. The method comprises the steps of carrying out learning modeling on extracted sample data through different algorithms of machine learning, predicting purchasing behaviors of users, carrying out model feedback through predicted results, helping the models to carry out parameter adjustment and repeated training, and continuously optimizing to obtain a fusion prediction model, wherein the fusion prediction model can be used for predicting future purchasing behaviors of the users, helping merchants to mine users with high potential purchasing intentions, guiding the merchants to push more accurate commodity information and preferential information to the users, and improving the user consumption conversion rate of marketing.
the above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.
Claims (10)
1. the method for mining the potential purchased goods and categories of the user based on the user behavior characteristics is characterized by comprising the following steps:
Data cleaning and data preprocessing are carried out, and preprocessed data are obtained;
Carrying out data coding on the preprocessed data, extracting basic features, statistical features, time interval features and calculation features, carrying out feature importance evaluation on the extracted features through a filtering method, screening out important features and redundant features, introducing a time regression theory in the feature importance evaluation process, aiming at the fact that the closer some user behaviors are to the prediction time, the larger the influence on the result is, carrying out weighting processing on the data features, and obtaining user behavior feature data;
Carrying out positive and negative sample analysis and classification on sample data, and generating a plurality of sample subsets by carrying out dynamic undersampling on the positive and negative samples to be used as positive and negative sample data of training;
Training the decision tree model through positive and negative sample data to train a plurality of single prediction models, and fusing the single prediction models through a stacking mode to generate a plurality of fused prediction models;
Predicting by using a fusion prediction model, comparing a prediction result with an expected value, and feeding back to the decision tree model for parameter adjustment and model retraining until optimal model parameters are obtained;
And predicting the potential purchased commodities and categories of the user based on the plurality of fusion prediction models, and processing and analyzing the prediction results of the fusion prediction models to obtain the potential purchased commodities and categories of the user with weights.
2. The method for mining the potential purchased goods and categories of the users based on the behavior characteristics of the users as claimed in claim 1, wherein the filtering method is to score each characteristic according to the index by using a correlation coefficient method, the score represents the importance of the characteristic, and then rank the characteristics according to the score.
3. The method of claim 2, wherein chi-squared statistics between each non-negative feature and the label are calculated using chi-squared filtering, and the features are ranked according to chi-squared statistics from high to low, and the class of the top K highest scoring features is selected, thereby removing features that are most likely independent of label and independent of classification purpose.
4. The method for mining the potential purchased commodities and categories of users based on the behavior characteristics of the users as claimed in claim 1, wherein the sample data includes the behavior characteristic data of a certain user for a certain commodity in a period of time, the commodities, and the characteristic data of the user.
5. The method for mining the potential purchased goods and categories of users based on the behavior characteristics of the users as claimed in claim 1, wherein the dynamic undersampling process is to extract a part of samples by a certain method aiming at the part of samples with too large number of samples so as to harmonize the proportion imbalance between the part of samples and other samples.
6. The method for mining the potential purchase goods and categories of the users based on the behavior characteristics of the users as claimed in claim 5, wherein a certain number of subsamples are extracted from the negative samples and combined with the positive samples into a new sample set by means of random extraction.
7. The method for mining the potential purchase goods and categories of users based on the user behavior characteristics as claimed in claim 1, wherein the decision tree model adopted is RF and GDBT algorithm.
8. The method for mining the potential purchased goods and categories of users based on the user behavior characteristics according to claim 1, wherein the process of generating the fusion prediction model by the plurality of single prediction models in a stacking manner comprises the following steps:
After the positive and negative samples are processed, n sample training sets train _ x, …, train _ y and a test set test are generated;
Firstly, selecting an untrained decision tree model;
extracting n-1 parts of the training set as small training sets s _ train _ x, … and s _ train _ y, and the other part of the training set as a small test set s _ test, wherein the test set test is unchanged;
Thirdly, training a decision tree model by s _ train _ x, … and s _ train _ y, predicting s _ test by the trained model to obtain corresponding s _ pred, and predicting test to obtain y _ pred;
selecting another part in the training set as a small test set s _ test _ x, and taking the other n-1 parts as the training set to train the decision tree model;
Repeating the steps of (a), (b) and (c) for n times to obtain n s _ preds and n y _ preds;
n s _ preds are used as a train _ X, the original train _ Y is used as a train _ Y to train a fusion prediction model to obtain a model G, the average value of the n Y _ preds is used as a new test _ X, and the test _ X is brought into the model G to obtain a prediction result;
In the second layer, a layer of stacking is performed by combining the output training set train _ X, train _ Y and the test set test _ X of the first layer with other feature sets, and the steps are repeated to generate a final fusion prediction model.
9. The method for mining the potential purchase goods and categories of users based on the user behavior characteristics as claimed in claim 1, wherein the expected value is set using F1-score.
10. The method of claim 1, wherein the input of the fused prediction model comprises predicting the behavior characteristics, the commodities and the categories of the user for a certain commodity within a fixed time before the date, and predicting the output as whether the user will purchase the commodity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910687675.0A CN110555717A (en) | 2019-07-29 | 2019-07-29 | method for mining potential purchased goods and categories of users based on user behavior characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910687675.0A CN110555717A (en) | 2019-07-29 | 2019-07-29 | method for mining potential purchased goods and categories of users based on user behavior characteristics |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110555717A true CN110555717A (en) | 2019-12-10 |
Family
ID=68736561
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910687675.0A Pending CN110555717A (en) | 2019-07-29 | 2019-07-29 | method for mining potential purchased goods and categories of users based on user behavior characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110555717A (en) |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046952A (en) * | 2019-12-12 | 2020-04-21 | 深圳市随手金服信息科技有限公司 | Method and device for establishing label mining model, storage medium and terminal |
CN111222923A (en) * | 2020-01-13 | 2020-06-02 | 秒针信息技术有限公司 | Method and device for judging potential customer, electronic equipment and storage medium |
CN111260210A (en) * | 2020-01-14 | 2020-06-09 | 广东南方视觉文化传媒有限公司 | Visual asset management system and method based on big data analysis |
CN111260419A (en) * | 2020-02-20 | 2020-06-09 | 世纪龙信息网络有限责任公司 | Method and device for acquiring user attribute, computer equipment and storage medium |
CN111494964A (en) * | 2020-06-30 | 2020-08-07 | 腾讯科技(深圳)有限公司 | Virtual article recommendation method, model training method, device and storage medium |
CN111507507A (en) * | 2020-03-24 | 2020-08-07 | 重庆森鑫炬科技有限公司 | Big data-based monthly water consumption prediction method |
CN111523976A (en) * | 2020-04-23 | 2020-08-11 | 京东数字科技控股有限公司 | Commodity recommendation method and device, electronic equipment and storage medium |
CN111737544A (en) * | 2020-05-13 | 2020-10-02 | 北京三快在线科技有限公司 | Search intention recognition method and device, electronic equipment and storage medium |
CN111860935A (en) * | 2020-05-21 | 2020-10-30 | 北京骑胜科技有限公司 | Fault prediction method, device, equipment and storage medium of vehicle |
CN111914164A (en) * | 2020-06-20 | 2020-11-10 | 武汉海云健康科技股份有限公司 | Medication prediction method and system based on medical big data |
CN112085541A (en) * | 2020-09-27 | 2020-12-15 | 中国建设银行股份有限公司 | User demand analysis method and device based on browsing consumption time series data |
CN112232388A (en) * | 2020-09-29 | 2021-01-15 | 南京财经大学 | ELM-RFE-based shopping intention key factor identification method |
CN112784787A (en) * | 2021-01-29 | 2021-05-11 | 南京智数云信息科技有限公司 | Device, system and method for analyzing and predicting user behavior based on deep learning |
CN112800111A (en) * | 2021-01-26 | 2021-05-14 | 重庆邮电大学 | Position prediction method based on training data mining |
CN113379482A (en) * | 2021-05-28 | 2021-09-10 | 车智互联(北京)科技有限公司 | Item recommendation method, computing device and storage medium |
CN113673866A (en) * | 2021-08-20 | 2021-11-19 | 上海寻梦信息技术有限公司 | Crop decision method, model training method and related equipment |
CN113706195A (en) * | 2021-08-26 | 2021-11-26 | 东北大学秦皇岛分校 | Online consumption behavior prediction method and system based on two-stage combination |
CN113763032A (en) * | 2021-08-03 | 2021-12-07 | 北京光速斑马数据科技有限公司 | Commodity purchase intention identification method and device |
CN113987018A (en) * | 2021-10-27 | 2022-01-28 | 平安国际智慧城市科技股份有限公司 | Character feature mining method, device, equipment and storage medium |
CN114169374A (en) * | 2021-12-10 | 2022-03-11 | 湖南工商大学 | Cable-stayed bridge stay cable damage identification method and electronic equipment |
CN115391669A (en) * | 2022-10-31 | 2022-11-25 | 江西渊薮信息科技有限公司 | Intelligent recommendation method and device and electronic equipment |
CN115471966A (en) * | 2022-08-02 | 2022-12-13 | 上海微波技术研究所(中国电子科技集团公司第五十研究所) | Self-learning intrusion alarm method, system, medium and equipment based on vibration optical fiber detection |
CN116109338A (en) * | 2022-12-12 | 2023-05-12 | 广东南粤分享汇控股有限公司 | Electric business analysis method and system based on artificial intelligence |
CN116805255A (en) * | 2023-06-05 | 2023-09-26 | 深圳市瀚力科技有限公司 | Advertisement automatic optimizing throwing system based on user image analysis |
CN117195061A (en) * | 2023-11-07 | 2023-12-08 | 腾讯科技(深圳)有限公司 | Event response prediction model processing method and device and computer equipment |
CN111914164B (en) * | 2020-06-20 | 2024-04-26 | 武汉海云健康科技股份有限公司 | Medication prediction method and system based on medical big data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107944986A (en) * | 2017-12-28 | 2018-04-20 | 广东工业大学 | A kind of O2O Method of Commodity Recommendation, system and equipment |
CN107944913A (en) * | 2017-11-21 | 2018-04-20 | 重庆邮电大学 | High potential user's purchase intention Forecasting Methodology based on big data user behavior analysis |
CN109255651A (en) * | 2018-08-22 | 2019-01-22 | 重庆邮电大学 | A kind of search advertisements conversion intelligent Forecasting based on big data |
CN109582741A (en) * | 2018-11-15 | 2019-04-05 | 阿里巴巴集团控股有限公司 | Characteristic treating method and apparatus |
-
2019
- 2019-07-29 CN CN201910687675.0A patent/CN110555717A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107944913A (en) * | 2017-11-21 | 2018-04-20 | 重庆邮电大学 | High potential user's purchase intention Forecasting Methodology based on big data user behavior analysis |
CN107944986A (en) * | 2017-12-28 | 2018-04-20 | 广东工业大学 | A kind of O2O Method of Commodity Recommendation, system and equipment |
CN109255651A (en) * | 2018-08-22 | 2019-01-22 | 重庆邮电大学 | A kind of search advertisements conversion intelligent Forecasting based on big data |
CN109582741A (en) * | 2018-11-15 | 2019-04-05 | 阿里巴巴集团控股有限公司 | Characteristic treating method and apparatus |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046952B (en) * | 2019-12-12 | 2023-11-14 | 深圳市铭数信息有限公司 | Method and device for establishing label mining model, storage medium and terminal |
CN111046952A (en) * | 2019-12-12 | 2020-04-21 | 深圳市随手金服信息科技有限公司 | Method and device for establishing label mining model, storage medium and terminal |
CN111222923A (en) * | 2020-01-13 | 2020-06-02 | 秒针信息技术有限公司 | Method and device for judging potential customer, electronic equipment and storage medium |
CN111222923B (en) * | 2020-01-13 | 2023-12-15 | 秒针信息技术有限公司 | Method and device for judging potential clients, electronic equipment and storage medium |
CN111260210A (en) * | 2020-01-14 | 2020-06-09 | 广东南方视觉文化传媒有限公司 | Visual asset management system and method based on big data analysis |
CN111260419A (en) * | 2020-02-20 | 2020-06-09 | 世纪龙信息网络有限责任公司 | Method and device for acquiring user attribute, computer equipment and storage medium |
CN111507507A (en) * | 2020-03-24 | 2020-08-07 | 重庆森鑫炬科技有限公司 | Big data-based monthly water consumption prediction method |
CN111523976A (en) * | 2020-04-23 | 2020-08-11 | 京东数字科技控股有限公司 | Commodity recommendation method and device, electronic equipment and storage medium |
CN111523976B (en) * | 2020-04-23 | 2023-12-08 | 京东科技控股股份有限公司 | Commodity recommendation method and device, electronic equipment and storage medium |
CN111737544A (en) * | 2020-05-13 | 2020-10-02 | 北京三快在线科技有限公司 | Search intention recognition method and device, electronic equipment and storage medium |
CN111860935A (en) * | 2020-05-21 | 2020-10-30 | 北京骑胜科技有限公司 | Fault prediction method, device, equipment and storage medium of vehicle |
CN111914164A (en) * | 2020-06-20 | 2020-11-10 | 武汉海云健康科技股份有限公司 | Medication prediction method and system based on medical big data |
CN111914164B (en) * | 2020-06-20 | 2024-04-26 | 武汉海云健康科技股份有限公司 | Medication prediction method and system based on medical big data |
CN111494964B (en) * | 2020-06-30 | 2020-11-20 | 腾讯科技(深圳)有限公司 | Virtual article recommendation method, model training method, device and storage medium |
CN111494964A (en) * | 2020-06-30 | 2020-08-07 | 腾讯科技(深圳)有限公司 | Virtual article recommendation method, model training method, device and storage medium |
CN112085541A (en) * | 2020-09-27 | 2020-12-15 | 中国建设银行股份有限公司 | User demand analysis method and device based on browsing consumption time series data |
CN112232388B (en) * | 2020-09-29 | 2024-02-13 | 南京财经大学 | Shopping intention key factor identification method based on ELM-RFE |
CN112232388A (en) * | 2020-09-29 | 2021-01-15 | 南京财经大学 | ELM-RFE-based shopping intention key factor identification method |
CN112800111A (en) * | 2021-01-26 | 2021-05-14 | 重庆邮电大学 | Position prediction method based on training data mining |
CN112800111B (en) * | 2021-01-26 | 2022-08-02 | 重庆邮电大学 | Position prediction method based on training data mining |
CN112784787A (en) * | 2021-01-29 | 2021-05-11 | 南京智数云信息科技有限公司 | Device, system and method for analyzing and predicting user behavior based on deep learning |
CN113379482A (en) * | 2021-05-28 | 2021-09-10 | 车智互联(北京)科技有限公司 | Item recommendation method, computing device and storage medium |
CN113379482B (en) * | 2021-05-28 | 2023-12-01 | 车智互联(北京)科技有限公司 | Article recommendation method, computing device and storage medium |
CN113763032A (en) * | 2021-08-03 | 2021-12-07 | 北京光速斑马数据科技有限公司 | Commodity purchase intention identification method and device |
CN113763032B (en) * | 2021-08-03 | 2023-08-04 | 北京光速斑马数据科技有限公司 | Commodity purchase intention recognition method and device |
CN113673866A (en) * | 2021-08-20 | 2021-11-19 | 上海寻梦信息技术有限公司 | Crop decision method, model training method and related equipment |
CN113706195A (en) * | 2021-08-26 | 2021-11-26 | 东北大学秦皇岛分校 | Online consumption behavior prediction method and system based on two-stage combination |
CN113706195B (en) * | 2021-08-26 | 2023-10-31 | 东北大学秦皇岛分校 | Online consumption behavior prediction method and system based on two-stage combination |
CN113987018A (en) * | 2021-10-27 | 2022-01-28 | 平安国际智慧城市科技股份有限公司 | Character feature mining method, device, equipment and storage medium |
CN114169374A (en) * | 2021-12-10 | 2022-03-11 | 湖南工商大学 | Cable-stayed bridge stay cable damage identification method and electronic equipment |
CN114169374B (en) * | 2021-12-10 | 2024-02-20 | 湖南工商大学 | Cable-stayed bridge stay cable damage identification method and electronic equipment |
CN115471966A (en) * | 2022-08-02 | 2022-12-13 | 上海微波技术研究所(中国电子科技集团公司第五十研究所) | Self-learning intrusion alarm method, system, medium and equipment based on vibration optical fiber detection |
CN115391669A (en) * | 2022-10-31 | 2022-11-25 | 江西渊薮信息科技有限公司 | Intelligent recommendation method and device and electronic equipment |
CN116109338B (en) * | 2022-12-12 | 2023-11-24 | 广东南粤分享汇控股有限公司 | Electric business analysis method and system based on artificial intelligence |
CN116109338A (en) * | 2022-12-12 | 2023-05-12 | 广东南粤分享汇控股有限公司 | Electric business analysis method and system based on artificial intelligence |
CN116805255B (en) * | 2023-06-05 | 2024-04-23 | 深圳市瀚力科技有限公司 | Advertisement automatic optimizing throwing system based on user image analysis |
CN116805255A (en) * | 2023-06-05 | 2023-09-26 | 深圳市瀚力科技有限公司 | Advertisement automatic optimizing throwing system based on user image analysis |
CN117195061A (en) * | 2023-11-07 | 2023-12-08 | 腾讯科技(深圳)有限公司 | Event response prediction model processing method and device and computer equipment |
CN117195061B (en) * | 2023-11-07 | 2024-03-29 | 腾讯科技(深圳)有限公司 | Event response prediction model processing method and device and computer equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110555717A (en) | method for mining potential purchased goods and categories of users based on user behavior characteristics | |
Xian et al. | Zero-shot learning-the good, the bad and the ugly | |
CN111222332B (en) | Commodity recommendation method combining attention network and user emotion | |
CN108090800B (en) | Game prop pushing method and device based on player consumption potential | |
Thorleuchter et al. | Predicting e-commerce company success by mining the text of its publicly-accessible website | |
CN110163647B (en) | Data processing method and device | |
CN109741112B (en) | User purchase intention prediction method based on mobile big data | |
CN109711955B (en) | Poor evaluation early warning method and system based on current order and blacklist base establishment method | |
Qu et al. | Matchmaking in reward-based crowdfunding platforms: A hybrid machine learning approach | |
CN106611375A (en) | Text analysis-based credit risk assessment method and apparatus | |
CN108921602B (en) | User purchasing behavior prediction method based on integrated neural network | |
CN103123633A (en) | Generation method of evaluation parameters and information searching method based on evaluation parameters | |
CN108596637B (en) | Automatic E-commerce service problem discovery system | |
CN111339439A (en) | Collaborative filtering recommendation method and device fusing comment text and time sequence effect | |
CN116431931A (en) | Real-time incremental data statistical analysis method | |
CN113283795A (en) | Data processing method and device based on two-classification model, medium and equipment | |
CN109670922B (en) | Online book value discovery method based on mixed features | |
WO2015030112A1 (en) | Document sorting system, document sorting method, and document sorting program | |
Rakhshaninejad et al. | An ensemble-based credit card fraud detection algorithm using an efficient voting strategy | |
Chaurasiya et al. | Improving performance of product recommendations using user reviews | |
CN104572623A (en) | Efficient data summary and analysis method of online LDA model | |
Abd Rahman et al. | Classification of customer feedbacks using sentiment analysis towards mobile banking applications | |
He et al. | Understanding Users' Coupon Usage Behaviors in E-Commerce Environments | |
CN112632275B (en) | Crowd clustering data processing method, device and equipment based on personal text information | |
CN115829683A (en) | Power integration commodity recommendation method and system based on inverse reward learning optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191210 |
|
RJ01 | Rejection of invention patent application after publication |