CN110555717A - method for mining potential purchased goods and categories of users based on user behavior characteristics - Google Patents

method for mining potential purchased goods and categories of users based on user behavior characteristics Download PDF

Info

Publication number
CN110555717A
CN110555717A CN201910687675.0A CN201910687675A CN110555717A CN 110555717 A CN110555717 A CN 110555717A CN 201910687675 A CN201910687675 A CN 201910687675A CN 110555717 A CN110555717 A CN 110555717A
Authority
CN
China
Prior art keywords
data
user
train
categories
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910687675.0A
Other languages
Chinese (zh)
Inventor
程锐
张艳青
杨漫瑶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201910687675.0A priority Critical patent/CN110555717A/en
Publication of CN110555717A publication Critical patent/CN110555717A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Game Theory and Decision Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the field of user behavior analysis and data mining, and relates to a method for mining potential purchased commodities and categories of users based on user behavior characteristics, which is used for carrying out data coding on preprocessed data and carrying out characteristic engineering processing to obtain user behavior characteristic data; carrying out positive and negative sample analysis and classification on sample data, and generating a plurality of sample subsets by carrying out dynamic undersampling on the positive and negative samples to be used as positive and negative sample data of training; training the decision tree model through positive and negative sample data to train a plurality of single prediction models, and fusing the single prediction models through a stacking mode to generate a plurality of fused prediction models; and predicting the potential purchased commodities and categories of the user based on the plurality of fusion prediction models, and processing and analyzing the prediction results of the fusion prediction models to obtain the potential purchased commodities and categories of the user with weights. The invention can help the commercial tenant to discover the users with high potential purchasing intention and improve the consumption conversion rate of the marketing users.

Description

method for mining potential purchased goods and categories of users based on user behavior characteristics
Technical Field
The invention belongs to the field of user behavior analysis and data mining, and relates to a method for mining potential purchased commodities and categories of users based on user behavior characteristics.
background
today, the electronic commerce is rapidly developed, active marketing can make merchants stand out in the market where the commodities are homogenized and spread, attract users and practically improve the consumption conversion rate of the marketing users. How to actively market is achieved through advertisement propaganda and media propagation, but the traditional modes are combined into a flow, and the conversion rate of the users is basically obtained through combination, so that a more effective method is needed to improve the consumption conversion rate of the users, the key point is how to accurately obtain target users and push commodity information which is most likely to be purchased to the target users, and how to obtain the target users and target commodities relates to the problems of mining and prediction.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method for mining potential target commodity classes purchased by a user based on user behavior characteristics.
The invention is realized by adopting the following technical scheme:
The method for mining the potential purchased goods and categories of the user based on the user behavior characteristics comprises the following steps:
Data cleaning and data preprocessing are carried out, and preprocessed data are obtained;
Carrying out data coding on the preprocessed data, extracting basic features, statistical features, time interval features and calculation features, carrying out feature importance evaluation on the extracted features through a filtering method, screening out important features and redundant features, introducing a time regression theory in the feature importance evaluation process, aiming at the fact that the closer some user behaviors are to the prediction time, the larger the influence on the result is, carrying out weighting processing on the data features, and obtaining user behavior feature data;
Carrying out positive and negative sample analysis and classification on sample data, and generating a plurality of sample subsets by carrying out dynamic undersampling on the positive and negative samples to be used as positive and negative sample data of training;
training the decision tree model through positive and negative sample data to train a plurality of single prediction models, and fusing the single prediction models through a stacking mode to generate a plurality of fused prediction models;
Predicting by using a fusion prediction model, comparing a prediction result with an expected value, and feeding back to the decision tree model for parameter adjustment and model retraining until optimal model parameters are obtained;
And predicting the potential purchased commodities and categories of the user based on the plurality of fusion prediction models, and processing and analyzing the prediction results of the fusion prediction models to obtain the potential purchased commodities and categories of the user with weights.
further, the filtering method is to score each feature according to an index by using a correlation coefficient method, wherein the score represents the importance of the feature, and then rank the features according to the scores.
preferably, chi-squared filtering is used to calculate the chi-squared statistic between each non-negative feature and the label, and the features are ranked according to chi-squared statistic from high to low, and the class of the top K features with the highest score is selected, thereby removing features that are most likely independent of label and unrelated to classification purposes.
preferably, the sample data includes behavioral characteristic data of a certain user for a certain commodity for a certain period of time, the commodity, characteristic data of the user himself.
further, the dynamic undersampling process is to extract a part of samples with an excessive number of samples by a certain method so as to coordinate the proportion imbalance between the part of samples and other samples.
preferably, a certain number of subsamples are extracted from the negative samples by random extraction, and combined with the positive samples to form a new sample set.
Preferably, the decision tree models employed are the RF and GDBT algorithms.
preferably, the process of generating the fusion prediction model by the plurality of single prediction models in a stacking manner includes:
after the positive and negative samples are processed, n sample training sets train _ x, …, train _ y and a test set test are generated;
firstly, selecting an untrained decision tree model;
Extracting n-1 parts of the training set as small training sets s _ train _ x, … and s _ train _ y, and the other part of the training set as a small test set s _ test, wherein the test set test is unchanged;
Thirdly, training a decision tree model by s _ train _ x, … and s _ train _ y, predicting s _ test by the trained model to obtain corresponding s _ pred, and predicting test to obtain y _ pred;
Selecting another part in the training set as a small test set s _ test _ x, and taking the other n-1 parts as the training set to train the decision tree model;
Repeating the steps of (a), (b) and (c) for n times to obtain n s _ preds and n y _ preds;
n s _ preds are used as a train _ X, the original train _ Y is used as a train _ Y to train a fusion prediction model to obtain a model G, the average value of the n Y _ preds is used as a new test _ X, and the test _ X is brought into the model G to obtain a prediction result;
In the second layer, a layer of stacking is performed by combining the output training set train _ X, train _ Y and the test set test _ X of the first layer with other feature sets, and the steps are repeated to generate a final fusion prediction model.
Preferably, the desired value is set using F1-score.
preferably, the input of the fusion prediction model comprises the behavior characteristics, commodities and categories of the user for a certain commodity in a fixed time before the prediction date, and the prediction output is whether the user will purchase the commodity.
compared with the prior art, the invention has the following beneficial effects:
(1) for the extracted data features, feature importance evaluation is carried out through a filtering method (each feature is scored according to an index by adopting a correlation coefficient method, the score represents the importance of the feature, then the features are sorted according to the score), important features and redundant features are screened out, in the feature importance evaluation process, aiming at the fact that the closer some user behaviors are to the prediction time, the larger the influence on the result is, a time regression theory is introduced (for some behaviors, such as click behaviors, if the weight is lower the farther away from the prediction day), weighting processing is carried out on the data features, and feature importance evaluation is more accurate.
(2) the method mainly includes that a certain number of sub-samples are extracted from negative samples in a random extraction mode, a new sample set is combined with positive samples to be used for learning training, a plurality of training subsets are generated by repeating the generation process of the new sample subset, each subset is independently trained, and then model fusion is carried out to carry out final prediction model synthesis.
(3) The method comprises the steps of training and predicting by utilizing various classifier models (decision tree models), fusing single decision tree models in a stacking mode to generate a plurality of fused prediction models, weighting and weight reducing processing are carried out by combining results of the plurality of fused prediction models, the weight reference of prediction results is enhanced, and user commodity recommendation is better carried out.
(4) dynamic diversity is carried out on a large amount of unbalanced user behavior data, and the failure of a classifier caused by unbalanced samples is prevented; and performing multi-model training by using the divided sample subsets, thereby improving the utilization rate and the training effect of data.
(5) The prediction model can improve the prediction precision of future purchasing behaviors of the user, help the merchant to discover the user with high potential purchasing intention, guide the merchant to push more accurate commodity information and preferential information to the user, and improve the user consumption conversion rate of marketing.
Drawings
fig. 1 is a flowchart illustrating a method for mining a category of a target commodity potentially purchased by a user based on user behavior characteristics according to an embodiment of the present invention.
FIG. 2 is a block diagram of user behavior characterization in accordance with an embodiment of the present invention;
FIG. 3 is a diagram illustrating a model training and parameter tuning process according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail below with reference to specific embodiments, but the embodiments of the present invention are not limited thereto.
At present, e-commerce is deeply involved in the life of each person, the proportion of behaviors of users in a network environment is increased, the clicking, browsing, purchasing, commenting and other behaviors of the users on the internet are seemingly irrelevant, very important information is hidden, user behavior feature extraction is carried out on the characteristics of user purchasing behaviors through a feature engineering theory, the behaviors of the users are understood in a machine learning mode, the future purchasing intention and target commodities of the users are predicted by utilizing historical behavior data of the users, and according to the prediction result, merchants can carry out more accurate commodity marketing and preferential pushing on the target users, the popularization user consumption conversion rate is improved, the minimum investment is achieved, and the maximum effect is obtained.
According to the method, the user behavior characteristics are extracted according to the business characteristics and the characteristic engineering method, a user behavior model is constructed by combining the characteristics of commodity data, and finally, the characteristic data are dynamically divided according to the balance of the user behavior characteristic data to generate a plurality of data sets for model training and prediction. The training and prediction of data are processed by adopting various machine learning classifier models, after the result of each model is predicted, the prediction results are fused, meanwhile, the results appearing in a plurality of prediction models are weighted, and the weight of the result appearing in a single prediction model is reduced.
The method for mining the potential purchase target commodity class of the user based on the user behavior characteristics, as shown in fig. 1, includes:
And S1, cleaning and preprocessing the data to obtain preprocessed data.
And carrying out abnormal data cleaning (including null value and abnormal value processing and abnormal user behavior data cleaning) on the collected user data and commodity data, and carrying out standardized processing according to specified data requirements.
the user data comprises user basic attribute data, user behavior data and user comment data. In this embodiment, the basic attribute data of the user includes gender, age, and registration time; the user behavior data comprises operation time, operation types, objects, products and product types; the user comment data comprise the number of user comments, the number of good comments, the number of bad comments and the time of final comment.
the commodity data includes commodity code, name, category code, category, and commodity attribute.
And carrying out abnormal value processing on the user data and the commodity data through the variance.
Data preprocessing as shown in fig. 2, includes: data format standardization, null value and illegal value statistics, and consistency detection of user, product and behavior data.
S2, carrying out data coding on the preprocessed data, extracting basic features, statistical features, time interval features and calculation features, carrying out feature importance evaluation on the extracted features through a filtering method, screening out important features and redundant features, introducing a time regression theory aiming at the fact that the closer some user behaviors are to the prediction time, the larger the effect on the result is, and carrying out weighting processing on the data features to obtain user behavior feature data.
as shown in fig. 2, step S2 may be understood as a feature engineering, including: the preprocessed data are coded, so that the data are conveniently used for decision tree training; extracting basic features of users and basic features of commodities; counting basic user behavior data through different dimensions, wherein the basic user behavior data mainly comprise time dimensions, commodity class dimensions, behavior type dimensions, user attribute dimensions and commodity attribute dimensions, and generating statistic class features, time interval class features and calculation class characteristics; performing characteristic association and fusion through the relationship among users, commodities and behaviors; and classifying the behaviors, and weighting the behaviors such as clicking, collecting and the like by using a time regression theory, wherein the behavior characteristic weight is lower when the behavior is farther away from the prediction time.
The filtering method of the invention is to score each feature according to an index by adopting a correlation coefficient method, wherein the score represents the importance of the feature, and then sort the features according to the scores. In this embodiment, chi-square filtering is used to calculate the chi-square statistic between each non-negative feature and the label, and rank the features according to the chi-square statistic from high to low, and select the top K classes of features with the highest scores, thereby removing features that are most likely to be independent of the label and unrelated to the classification purpose.
The invention introduces a time decay theory and carries out weighting processing on data characteristics, and the method comprises the following steps: weighting certain behaviors, such as click behaviors, makes the feature importance assessment more accurate if the weights are lower the farther away from the predicted day.
In this embodiment, the extraction process of the basic feature, the statistical feature, the time interval feature, and the calculation feature includes:
One-hot coding, e.g. type, is performed on the basic features. One-hot encoding, also known as one-bit-efficient encoding, uses an N-bit register to represent N states, each state having its own dedicated register bit and always remaining one bit active. The advantage of the one-hot coding is that the design is convenient and the realization is easy; in addition, it can also encode non-contiguous features without the need for a decoding operation.
Secondly, counting various user behaviors in different periods, such as the number of clicks, the number of collections, repeated behavior statistics and the like within 7 days.
And thirdly, extracting interval type features of certain behaviors, such as the interval between the last browsing and the previous browsing of the commodity by the user.
And fourthly, calculating the characteristics such as purchase conversion rate after the user adds the shopping cart within one week, purchase conversion rate of the browsing number of the user, the added number of the user and the concerned number of the user, time decay weighting of user behavior statistics and the like.
And S3, carrying out positive and negative sample analysis and classification on the sample data, and generating a plurality of sample subsets by carrying out dynamic undersampling on the positive and negative samples to be used as the positive and negative sample data of training.
The sample data of the invention comprises behavior characteristic data of a certain user for a certain commodity in a period of time, the commodity, characteristic data of the user and the like.
The sample data is marked by combining actual purchase data of a user, user characteristic data and the like to generate positive and negative sample data, the unbalance condition of the positive and negative samples is automatically analyzed, the sample data is subjected to proper undersampling processing according to the preset positive and negative proportion, and a plurality of positive and negative sample subsets are generated. If the user has purchased the item within the forecast date, the sample data is a positive sample, otherwise it is a negative sample.
aiming at the unbalance of positive and negative samples possibly existing in sample data, if the data are directly used, the training result has bias, so that the invention generates a plurality of sample subsets by the positive and negative samples through dynamic undersampling processing, and each subset is independently trained. The dynamic undersampling processing of the invention is to extract a part of samples with overlarge number of samples by a certain method so as to coordinate the unbalanced proportion between the samples and other samples. In this embodiment, a certain number of subsamples are extracted from the negative samples in a random extraction manner, and are combined with the positive samples to form a new sample set for learning training, and a plurality of classifiers are trained by repeating the generation process of the new sample subset.
And S4, training the decision tree model through positive and negative sample data to train a plurality of single prediction models, and fusing the single prediction models through a stacking mode to generate a plurality of fused prediction models.
In this embodiment, based on the purpose of prediction, the two-class problem may be abstracted, the adopted decision tree model is an RF and GDBT algorithm to train the sample data set, generate a plurality of prediction models, and fuse the models in a stacking manner to generate a plurality of fusion prediction models.
After the decision tree model is stable, LR is considered to be used for weighting the prediction result, and the accuracy of the fusion model is enhanced.
Generating a process description of the fusion prediction model by the plurality of single prediction models in a stacking mode:
after the positive and negative samples are processed, n sample training sets train _ x, …, train _ y, and test set test are generated.
Firstly, a decision tree model is selected, such as random forest RF or gradient boosting decision tree GDBT. (untrained)
And secondly, extracting n-1 parts of the training set as small training sets s _ train _ x, … and s _ train _ y, and extracting the other part of the training set as a small test set s _ test, wherein the test set test is unchanged.
and thirdly, training an RF model or a GDTB model by s _ train _ x, … and s _ train _ y, predicting s _ test by the trained model to obtain corresponding s _ pred, and predicting test to obtain y _ pred.
And selecting another part in the training set as a small test set s _ test _ x, and using the other n-1 parts as the training set to train the model RF or GDTB model.
And fifthly, repeating the steps of the third step, the fourth step for n times to obtain n s _ preds and n y _ preds.
n s _ preds are used as a train _ X, the original train _ Y is used as a train _ Y to train a fusion prediction model to obtain a model G, the average value of the n Y _ preds is used as a new test _ X, and the test _ X is brought into the model G to obtain a prediction result.
In the second layer, a layer of stacking is performed by combining the output training set train _ X, train _ Y and the test set test _ X of the first layer with other feature sets, and the steps are repeated to generate a final fusion prediction model.
and S5, predicting the result by using the prediction model, comparing the result with an expected value, and feeding back to the decision tree model for parameter adjustment and model retraining until the optimal model parameters are obtained.
And predicting historical data by using a prediction model, comparing historical results, judging results according to preset coverage, recording the model if the historical data do not accord with the preset values, returning to the decision tree model to adjust parameters, and retraining and predicting until the coverage is higher than the preset value or the training times reach the preset value. In this embodiment, the model training and parameter adjusting process is shown in fig. 3.
The expected value is related to the model, and in this embodiment, the expected value is set by F1-score:
(1) precision (Precision): accuracy is the percentage of true classes among all identified "positive classes," which refers to the probability that the resulting positive sample is identified correctly. The calculation formula of the accuracy is as follows:
Wherein: TP is True Positive, determined to be a Positive sample, and in fact also a Positive sample; FP is False Positive and is judged to be a Positive sample, but is actually a negative sample.
(2) Recall (Recall): also called recall ratio, is the proportion of real classes in the test set to all positive classes. The recall ratio is calculated as follows:
wherein: FN is False Negative, and is judged as a Negative sample, but is in fact a positive sample.
(3) F-score: the accuracy and the Recall rate are often contradictory, the F-score considers the two values at the same time, the F-score is a harmonic average value of the Precision rate (Precision) and the Recall rate (Recall), the F1-socre is the condition that the beta is 1 in the general formula of the F-score, namely the Precision value and the Recall value have the same importance, and when the beta is more than 1, the proportion occupied by the Recall is larger. The general formula for F-score and the calculation formula for F1-socre are as follows:
In the formula, P represents accuracy, R represents recall, β is a weight for balancing accuracy and recall in F-score calculation, and the following three values are taken:
if 1 is taken, the accuracy rate is as important as the recall rate;
If the value is less than 1, the accuracy rate is more important than the recall rate;
If greater than 1 is taken, it indicates that recall is more important than accuracy.
And S6, predicting the potential purchased commodities and categories of the user based on the plurality of fusion prediction models, and processing and analyzing the prediction results of the fusion prediction models to obtain the potential purchased commodities and categories of the user with weights.
in this embodiment, the input data of the fusion prediction model includes behavior characteristics, commodities, categories, and the like of the user for a certain commodity in a fixed time before the prediction date, and predicts whether the output user will purchase the commodity.
And performing intersection and difference set operation on the prediction results of the fusion prediction models, wherein the recommendation weight of the intersection result is high, and the weight of the difference result is low. And if the predicted potential purchased commodities and categories of the user are predicted to be purchased by the user in a plurality of fusion prediction models, weighting the potential purchased commodities and categories of the user, and if the potential purchased commodities and categories of the user only appear in the output of a certain fusion prediction model, reducing the weight to finally obtain the prediction result of the potential purchased commodities and categories of the user with the weight.
The method of the present invention is described below by taking the example of mining the behavior characteristics of users of a certain website to find potential purchased goods and categories.
Firstly, collecting basic information of a user of a website, wherein the basic information comprises age, gender, registration time and the like; collecting user behavior data, including user online behavior data of clicking, browsing, purchasing, commenting, collecting, paying attention to, canceling attention to and the like; collecting commodity data: including name, category, commodity attributes, etc.; and converting the related data into a format required by the method of the invention, and cleaning abnormal data.
Secondly, encoding the basic data, and conveniently using the basic data for machine learning training; extracting basic features of users and basic features of commodities; counting basic user behavior data through different dimensions, wherein the basic user behavior data mainly comprise time dimensions, commodity class dimensions, behavior type dimensions, user attribute dimensions and commodity attribute dimensions, and generating statistic class features, time interval class features and calculation class characteristics; performing characteristic association and fusion through the relationship among users, commodities and behaviors; classifying the behaviors, weighting the behaviors such as clicking, collecting and the like by using a time regression theory, wherein the behavior characteristic weight which is farther away from the prediction time is lower;
and calculating chi-square statistic between each non-negative feature and the label by using chi-square filtering, and ranking the features according to the chi-square statistic from high to low, and selecting the class of the top K features with the highest score, thereby removing the features which are most possibly independent of the label and are irrelevant to the classification purpose.
and thirdly, marking the sample by combining actual purchase data of a user and user characteristic data to generate positive and negative sample data, automatically analyzing the unbalanced condition of the positive and negative samples, and performing proper undersampling processing on the sample data according to a preset positive and negative proportion to generate a plurality of positive and negative sample subsets.
and fourthly, training the sample data set through an RF (radio frequency) algorithm and a GDBT (generalized differential bit rate) algorithm to generate a plurality of prediction models, and fusing the models in a stacking mode to generate a plurality of fusion prediction models.
And fifthly, predicting historical data by using the fusion prediction model, comparing historical results, judging results according to preset coverage, recording the model if the results do not accord with the preset values, returning to the fourth step to adjust the model parameters, and retraining and predicting until the coverage is higher than the preset values or the training times reach the preset values.
And sixthly, intersecting results of the plurality of fusion prediction models, weighting the results if the results appear in the plurality of fusion prediction models, and reducing the weight if the results appear in only one fusion prediction model to obtain a prediction result with the weight, thereby better guiding the business development.
the invention discloses a method for mining potential target commodity categories purchased by a user based on user behavior characteristics, which comprises the steps of taking user behavior data of an electric business as an entry point, carrying out data acquisition according to characteristics of the electric business, carrying out standardization processing and user behavior characteristic extraction on the user data by combining a characteristic engineering theory, constructing a user behavior model, carrying out training analysis on the user behavior data through a machine learning algorithm, and outputting a prediction model capable of most 'understanding' the user behavior, wherein the prediction model is used for predicting the purchase intention of the user in a certain period in the future and the commodities and categories which are most likely to be purchased.
if the user behavior is to be understood and predicted effectively, the user behavior must be analyzed and refined effectively to obtain useful data and eliminate invalid noise, so as to obtain valuable data which can be used for user representation. The important data characteristics of the user are obtained, the data are subjected to correlation analysis, data modeling is conducted with the purpose of predicting the commodity purchasing intention and the classification intention of the user as guidance, the user behavior characteristic data are really converted into a user behavior model which can be used for prediction, and the correlation and analysis process can be realized by adopting a machine learning method. The method comprises the steps of carrying out learning modeling on extracted sample data through different algorithms of machine learning, predicting purchasing behaviors of users, carrying out model feedback through predicted results, helping the models to carry out parameter adjustment and repeated training, and continuously optimizing to obtain a fusion prediction model, wherein the fusion prediction model can be used for predicting future purchasing behaviors of the users, helping merchants to mine users with high potential purchasing intentions, guiding the merchants to push more accurate commodity information and preferential information to the users, and improving the user consumption conversion rate of marketing.
the above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (10)

1. the method for mining the potential purchased goods and categories of the user based on the user behavior characteristics is characterized by comprising the following steps:
Data cleaning and data preprocessing are carried out, and preprocessed data are obtained;
Carrying out data coding on the preprocessed data, extracting basic features, statistical features, time interval features and calculation features, carrying out feature importance evaluation on the extracted features through a filtering method, screening out important features and redundant features, introducing a time regression theory in the feature importance evaluation process, aiming at the fact that the closer some user behaviors are to the prediction time, the larger the influence on the result is, carrying out weighting processing on the data features, and obtaining user behavior feature data;
Carrying out positive and negative sample analysis and classification on sample data, and generating a plurality of sample subsets by carrying out dynamic undersampling on the positive and negative samples to be used as positive and negative sample data of training;
Training the decision tree model through positive and negative sample data to train a plurality of single prediction models, and fusing the single prediction models through a stacking mode to generate a plurality of fused prediction models;
Predicting by using a fusion prediction model, comparing a prediction result with an expected value, and feeding back to the decision tree model for parameter adjustment and model retraining until optimal model parameters are obtained;
And predicting the potential purchased commodities and categories of the user based on the plurality of fusion prediction models, and processing and analyzing the prediction results of the fusion prediction models to obtain the potential purchased commodities and categories of the user with weights.
2. The method for mining the potential purchased goods and categories of the users based on the behavior characteristics of the users as claimed in claim 1, wherein the filtering method is to score each characteristic according to the index by using a correlation coefficient method, the score represents the importance of the characteristic, and then rank the characteristics according to the score.
3. The method of claim 2, wherein chi-squared statistics between each non-negative feature and the label are calculated using chi-squared filtering, and the features are ranked according to chi-squared statistics from high to low, and the class of the top K highest scoring features is selected, thereby removing features that are most likely independent of label and independent of classification purpose.
4. The method for mining the potential purchased commodities and categories of users based on the behavior characteristics of the users as claimed in claim 1, wherein the sample data includes the behavior characteristic data of a certain user for a certain commodity in a period of time, the commodities, and the characteristic data of the user.
5. The method for mining the potential purchased goods and categories of users based on the behavior characteristics of the users as claimed in claim 1, wherein the dynamic undersampling process is to extract a part of samples by a certain method aiming at the part of samples with too large number of samples so as to harmonize the proportion imbalance between the part of samples and other samples.
6. The method for mining the potential purchase goods and categories of the users based on the behavior characteristics of the users as claimed in claim 5, wherein a certain number of subsamples are extracted from the negative samples and combined with the positive samples into a new sample set by means of random extraction.
7. The method for mining the potential purchase goods and categories of users based on the user behavior characteristics as claimed in claim 1, wherein the decision tree model adopted is RF and GDBT algorithm.
8. The method for mining the potential purchased goods and categories of users based on the user behavior characteristics according to claim 1, wherein the process of generating the fusion prediction model by the plurality of single prediction models in a stacking manner comprises the following steps:
After the positive and negative samples are processed, n sample training sets train _ x, …, train _ y and a test set test are generated;
Firstly, selecting an untrained decision tree model;
extracting n-1 parts of the training set as small training sets s _ train _ x, … and s _ train _ y, and the other part of the training set as a small test set s _ test, wherein the test set test is unchanged;
Thirdly, training a decision tree model by s _ train _ x, … and s _ train _ y, predicting s _ test by the trained model to obtain corresponding s _ pred, and predicting test to obtain y _ pred;
selecting another part in the training set as a small test set s _ test _ x, and taking the other n-1 parts as the training set to train the decision tree model;
Repeating the steps of (a), (b) and (c) for n times to obtain n s _ preds and n y _ preds;
n s _ preds are used as a train _ X, the original train _ Y is used as a train _ Y to train a fusion prediction model to obtain a model G, the average value of the n Y _ preds is used as a new test _ X, and the test _ X is brought into the model G to obtain a prediction result;
In the second layer, a layer of stacking is performed by combining the output training set train _ X, train _ Y and the test set test _ X of the first layer with other feature sets, and the steps are repeated to generate a final fusion prediction model.
9. The method for mining the potential purchase goods and categories of users based on the user behavior characteristics as claimed in claim 1, wherein the expected value is set using F1-score.
10. The method of claim 1, wherein the input of the fused prediction model comprises predicting the behavior characteristics, the commodities and the categories of the user for a certain commodity within a fixed time before the date, and predicting the output as whether the user will purchase the commodity.
CN201910687675.0A 2019-07-29 2019-07-29 method for mining potential purchased goods and categories of users based on user behavior characteristics Pending CN110555717A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910687675.0A CN110555717A (en) 2019-07-29 2019-07-29 method for mining potential purchased goods and categories of users based on user behavior characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910687675.0A CN110555717A (en) 2019-07-29 2019-07-29 method for mining potential purchased goods and categories of users based on user behavior characteristics

Publications (1)

Publication Number Publication Date
CN110555717A true CN110555717A (en) 2019-12-10

Family

ID=68736561

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910687675.0A Pending CN110555717A (en) 2019-07-29 2019-07-29 method for mining potential purchased goods and categories of users based on user behavior characteristics

Country Status (1)

Country Link
CN (1) CN110555717A (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046952A (en) * 2019-12-12 2020-04-21 深圳市随手金服信息科技有限公司 Method and device for establishing label mining model, storage medium and terminal
CN111222923A (en) * 2020-01-13 2020-06-02 秒针信息技术有限公司 Method and device for judging potential customer, electronic equipment and storage medium
CN111260210A (en) * 2020-01-14 2020-06-09 广东南方视觉文化传媒有限公司 Visual asset management system and method based on big data analysis
CN111260419A (en) * 2020-02-20 2020-06-09 世纪龙信息网络有限责任公司 Method and device for acquiring user attribute, computer equipment and storage medium
CN111494964A (en) * 2020-06-30 2020-08-07 腾讯科技(深圳)有限公司 Virtual article recommendation method, model training method, device and storage medium
CN111507507A (en) * 2020-03-24 2020-08-07 重庆森鑫炬科技有限公司 Big data-based monthly water consumption prediction method
CN111523976A (en) * 2020-04-23 2020-08-11 京东数字科技控股有限公司 Commodity recommendation method and device, electronic equipment and storage medium
CN111737544A (en) * 2020-05-13 2020-10-02 北京三快在线科技有限公司 Search intention recognition method and device, electronic equipment and storage medium
CN111860935A (en) * 2020-05-21 2020-10-30 北京骑胜科技有限公司 Fault prediction method, device, equipment and storage medium of vehicle
CN111914164A (en) * 2020-06-20 2020-11-10 武汉海云健康科技股份有限公司 Medication prediction method and system based on medical big data
CN112085541A (en) * 2020-09-27 2020-12-15 中国建设银行股份有限公司 User demand analysis method and device based on browsing consumption time series data
CN112232388A (en) * 2020-09-29 2021-01-15 南京财经大学 ELM-RFE-based shopping intention key factor identification method
CN112784787A (en) * 2021-01-29 2021-05-11 南京智数云信息科技有限公司 Device, system and method for analyzing and predicting user behavior based on deep learning
CN112800111A (en) * 2021-01-26 2021-05-14 重庆邮电大学 Position prediction method based on training data mining
CN113379482A (en) * 2021-05-28 2021-09-10 车智互联(北京)科技有限公司 Item recommendation method, computing device and storage medium
CN113673866A (en) * 2021-08-20 2021-11-19 上海寻梦信息技术有限公司 Crop decision method, model training method and related equipment
CN113706195A (en) * 2021-08-26 2021-11-26 东北大学秦皇岛分校 Online consumption behavior prediction method and system based on two-stage combination
CN113763032A (en) * 2021-08-03 2021-12-07 北京光速斑马数据科技有限公司 Commodity purchase intention identification method and device
CN113987018A (en) * 2021-10-27 2022-01-28 平安国际智慧城市科技股份有限公司 Character feature mining method, device, equipment and storage medium
CN114169374A (en) * 2021-12-10 2022-03-11 湖南工商大学 Cable-stayed bridge stay cable damage identification method and electronic equipment
CN115391669A (en) * 2022-10-31 2022-11-25 江西渊薮信息科技有限公司 Intelligent recommendation method and device and electronic equipment
CN115471966A (en) * 2022-08-02 2022-12-13 上海微波技术研究所(中国电子科技集团公司第五十研究所) Self-learning intrusion alarm method, system, medium and equipment based on vibration optical fiber detection
CN116109338A (en) * 2022-12-12 2023-05-12 广东南粤分享汇控股有限公司 Electric business analysis method and system based on artificial intelligence
CN116805255A (en) * 2023-06-05 2023-09-26 深圳市瀚力科技有限公司 Advertisement automatic optimizing throwing system based on user image analysis
CN117195061A (en) * 2023-11-07 2023-12-08 腾讯科技(深圳)有限公司 Event response prediction model processing method and device and computer equipment
CN111914164B (en) * 2020-06-20 2024-04-26 武汉海云健康科技股份有限公司 Medication prediction method and system based on medical big data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944986A (en) * 2017-12-28 2018-04-20 广东工业大学 A kind of O2O Method of Commodity Recommendation, system and equipment
CN107944913A (en) * 2017-11-21 2018-04-20 重庆邮电大学 High potential user's purchase intention Forecasting Methodology based on big data user behavior analysis
CN109255651A (en) * 2018-08-22 2019-01-22 重庆邮电大学 A kind of search advertisements conversion intelligent Forecasting based on big data
CN109582741A (en) * 2018-11-15 2019-04-05 阿里巴巴集团控股有限公司 Characteristic treating method and apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944913A (en) * 2017-11-21 2018-04-20 重庆邮电大学 High potential user's purchase intention Forecasting Methodology based on big data user behavior analysis
CN107944986A (en) * 2017-12-28 2018-04-20 广东工业大学 A kind of O2O Method of Commodity Recommendation, system and equipment
CN109255651A (en) * 2018-08-22 2019-01-22 重庆邮电大学 A kind of search advertisements conversion intelligent Forecasting based on big data
CN109582741A (en) * 2018-11-15 2019-04-05 阿里巴巴集团控股有限公司 Characteristic treating method and apparatus

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046952B (en) * 2019-12-12 2023-11-14 深圳市铭数信息有限公司 Method and device for establishing label mining model, storage medium and terminal
CN111046952A (en) * 2019-12-12 2020-04-21 深圳市随手金服信息科技有限公司 Method and device for establishing label mining model, storage medium and terminal
CN111222923A (en) * 2020-01-13 2020-06-02 秒针信息技术有限公司 Method and device for judging potential customer, electronic equipment and storage medium
CN111222923B (en) * 2020-01-13 2023-12-15 秒针信息技术有限公司 Method and device for judging potential clients, electronic equipment and storage medium
CN111260210A (en) * 2020-01-14 2020-06-09 广东南方视觉文化传媒有限公司 Visual asset management system and method based on big data analysis
CN111260419A (en) * 2020-02-20 2020-06-09 世纪龙信息网络有限责任公司 Method and device for acquiring user attribute, computer equipment and storage medium
CN111507507A (en) * 2020-03-24 2020-08-07 重庆森鑫炬科技有限公司 Big data-based monthly water consumption prediction method
CN111523976A (en) * 2020-04-23 2020-08-11 京东数字科技控股有限公司 Commodity recommendation method and device, electronic equipment and storage medium
CN111523976B (en) * 2020-04-23 2023-12-08 京东科技控股股份有限公司 Commodity recommendation method and device, electronic equipment and storage medium
CN111737544A (en) * 2020-05-13 2020-10-02 北京三快在线科技有限公司 Search intention recognition method and device, electronic equipment and storage medium
CN111860935A (en) * 2020-05-21 2020-10-30 北京骑胜科技有限公司 Fault prediction method, device, equipment and storage medium of vehicle
CN111914164A (en) * 2020-06-20 2020-11-10 武汉海云健康科技股份有限公司 Medication prediction method and system based on medical big data
CN111914164B (en) * 2020-06-20 2024-04-26 武汉海云健康科技股份有限公司 Medication prediction method and system based on medical big data
CN111494964B (en) * 2020-06-30 2020-11-20 腾讯科技(深圳)有限公司 Virtual article recommendation method, model training method, device and storage medium
CN111494964A (en) * 2020-06-30 2020-08-07 腾讯科技(深圳)有限公司 Virtual article recommendation method, model training method, device and storage medium
CN112085541A (en) * 2020-09-27 2020-12-15 中国建设银行股份有限公司 User demand analysis method and device based on browsing consumption time series data
CN112232388B (en) * 2020-09-29 2024-02-13 南京财经大学 Shopping intention key factor identification method based on ELM-RFE
CN112232388A (en) * 2020-09-29 2021-01-15 南京财经大学 ELM-RFE-based shopping intention key factor identification method
CN112800111A (en) * 2021-01-26 2021-05-14 重庆邮电大学 Position prediction method based on training data mining
CN112800111B (en) * 2021-01-26 2022-08-02 重庆邮电大学 Position prediction method based on training data mining
CN112784787A (en) * 2021-01-29 2021-05-11 南京智数云信息科技有限公司 Device, system and method for analyzing and predicting user behavior based on deep learning
CN113379482A (en) * 2021-05-28 2021-09-10 车智互联(北京)科技有限公司 Item recommendation method, computing device and storage medium
CN113379482B (en) * 2021-05-28 2023-12-01 车智互联(北京)科技有限公司 Article recommendation method, computing device and storage medium
CN113763032A (en) * 2021-08-03 2021-12-07 北京光速斑马数据科技有限公司 Commodity purchase intention identification method and device
CN113763032B (en) * 2021-08-03 2023-08-04 北京光速斑马数据科技有限公司 Commodity purchase intention recognition method and device
CN113673866A (en) * 2021-08-20 2021-11-19 上海寻梦信息技术有限公司 Crop decision method, model training method and related equipment
CN113706195A (en) * 2021-08-26 2021-11-26 东北大学秦皇岛分校 Online consumption behavior prediction method and system based on two-stage combination
CN113706195B (en) * 2021-08-26 2023-10-31 东北大学秦皇岛分校 Online consumption behavior prediction method and system based on two-stage combination
CN113987018A (en) * 2021-10-27 2022-01-28 平安国际智慧城市科技股份有限公司 Character feature mining method, device, equipment and storage medium
CN114169374A (en) * 2021-12-10 2022-03-11 湖南工商大学 Cable-stayed bridge stay cable damage identification method and electronic equipment
CN114169374B (en) * 2021-12-10 2024-02-20 湖南工商大学 Cable-stayed bridge stay cable damage identification method and electronic equipment
CN115471966A (en) * 2022-08-02 2022-12-13 上海微波技术研究所(中国电子科技集团公司第五十研究所) Self-learning intrusion alarm method, system, medium and equipment based on vibration optical fiber detection
CN115391669A (en) * 2022-10-31 2022-11-25 江西渊薮信息科技有限公司 Intelligent recommendation method and device and electronic equipment
CN116109338B (en) * 2022-12-12 2023-11-24 广东南粤分享汇控股有限公司 Electric business analysis method and system based on artificial intelligence
CN116109338A (en) * 2022-12-12 2023-05-12 广东南粤分享汇控股有限公司 Electric business analysis method and system based on artificial intelligence
CN116805255B (en) * 2023-06-05 2024-04-23 深圳市瀚力科技有限公司 Advertisement automatic optimizing throwing system based on user image analysis
CN116805255A (en) * 2023-06-05 2023-09-26 深圳市瀚力科技有限公司 Advertisement automatic optimizing throwing system based on user image analysis
CN117195061A (en) * 2023-11-07 2023-12-08 腾讯科技(深圳)有限公司 Event response prediction model processing method and device and computer equipment
CN117195061B (en) * 2023-11-07 2024-03-29 腾讯科技(深圳)有限公司 Event response prediction model processing method and device and computer equipment

Similar Documents

Publication Publication Date Title
CN110555717A (en) method for mining potential purchased goods and categories of users based on user behavior characteristics
Xian et al. Zero-shot learning-the good, the bad and the ugly
CN111222332B (en) Commodity recommendation method combining attention network and user emotion
CN108090800B (en) Game prop pushing method and device based on player consumption potential
Thorleuchter et al. Predicting e-commerce company success by mining the text of its publicly-accessible website
CN110163647B (en) Data processing method and device
CN109741112B (en) User purchase intention prediction method based on mobile big data
CN109711955B (en) Poor evaluation early warning method and system based on current order and blacklist base establishment method
Qu et al. Matchmaking in reward-based crowdfunding platforms: A hybrid machine learning approach
CN106611375A (en) Text analysis-based credit risk assessment method and apparatus
CN108921602B (en) User purchasing behavior prediction method based on integrated neural network
CN103123633A (en) Generation method of evaluation parameters and information searching method based on evaluation parameters
CN108596637B (en) Automatic E-commerce service problem discovery system
CN111339439A (en) Collaborative filtering recommendation method and device fusing comment text and time sequence effect
CN116431931A (en) Real-time incremental data statistical analysis method
CN113283795A (en) Data processing method and device based on two-classification model, medium and equipment
CN109670922B (en) Online book value discovery method based on mixed features
WO2015030112A1 (en) Document sorting system, document sorting method, and document sorting program
Rakhshaninejad et al. An ensemble-based credit card fraud detection algorithm using an efficient voting strategy
Chaurasiya et al. Improving performance of product recommendations using user reviews
CN104572623A (en) Efficient data summary and analysis method of online LDA model
Abd Rahman et al. Classification of customer feedbacks using sentiment analysis towards mobile banking applications
He et al. Understanding Users' Coupon Usage Behaviors in E-Commerce Environments
CN112632275B (en) Crowd clustering data processing method, device and equipment based on personal text information
CN115829683A (en) Power integration commodity recommendation method and system based on inverse reward learning optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191210

RJ01 Rejection of invention patent application after publication