Disclosure of Invention
The technical problem is as follows: the technical problem to be solved by the invention is as follows: the method and the system are used for accurately analyzing the behavior data of the user based on the historical behavior data of the user, providing an individualized commodity recommendation list for the user and recommending commodities more accurately.
The technical scheme is as follows: in order to solve the above problem, the embodiment of the present invention adopts the following technical solutions:
in a first aspect, the present embodiment provides a method for recommending commodity information based on user historical behaviors, including the following steps:
s11, collecting historical behavior data of a user on an e-commerce website, wherein the historical behavior data comprises user information and commodity information;
s12, establishing a user commodity probability prediction feature vector according to the historical behavior data;
s13, training the model according to the user commodity probability prediction feature vector to obtain a user recommended commodity prediction model;
s14, inputting the user data to be predicted into a user recommended commodity prediction model, and measuring and calculating the predicted purchase probability of behavior commodities;
s15, according to the predicted purchasing probability of the behavior commodity, the predicted purchasing probability of the related commodity is calculated, and the behavior commodity and the related commodity are combined to obtain a commodity recommendation list.
With reference to the first aspect, as a first possible implementation manner, in S11, the historical behavior data is derived from a PC end, a WAP end, an APP end, and offline data; the user information comprises an identification ID of the user, the gender, the age and the access preference of the user; the commodity information comprises the identification code of the commodity, the commodity flow characteristic, the commodity behavior characteristic and the commodity decision cost.
With reference to the first aspect, as a second possible implementation manner, the establishing a user commodity probability prediction feature vector in S12 specifically includes:
s201, data cleaning: cleaning the abnormal data and the data which do not accord with the browsing habit of the user;
s202, performing feature processing to obtain a user feature value: the cleaned historical behavior characteristics of each terminal user are counted according to days, a user historical behavior characteristic counting function is respectively constructed, counting days is divided into M sections, the characteristic value of each section is measured and calculated according to a time attenuation function for each section, and the characteristic value of each section is accumulated to obtain a user characteristic value:
s203, establishing a user commodity probability prediction feature vector: the expression form of the user commodity probability prediction feature vector is as follows: fingerprint ID + commodity ID + user characteristic vector value of each terminal; the fingerprint ID represents the identification ID of the user, and the commodity ID represents the identification code of the commodity.
With reference to the first aspect, as a third possible implementation manner, the S15 specifically includes:
s301 determines the associated product: calculating the association degree of the associated commodities of the behavior commodity by adopting an association rule or a collaborative filtering algorithm according to access history data and purchase history data in the historical behavior data of the user, and taking the first b commodities with the highest association degree as an associated commodity set of the behavior commodity;
s302, calculating the purchase probability of the associated commodity according to the formula (1):
score _ i (Master _ Pos SKU _ Score _ i/max (SKU _ Score _ i) formula (1)
Wherein Score _ i represents the purchase probability of the associated item; master _ Pos represents the behavior commodity purchase probability; max (SKU _ Score _ i) represents the highest value of the association degree in the associated commodity set, and SKU _ Score _ i represents the association degree of the associated commodity SKU _ i and the behavior commodity;
s303, combining the behavior commodities and the associated commodities to generate a commodity recommendation list: if the behavior commodities are in the associated commodity set obtained in the step S301, sorting the behavior commodities and the associated commodities according to the predicted purchase probability of the behavior commodities and the associated commodities to obtain a commodity recommendation list; if the behavior commodity is not in the associated commodity set obtained in step S301 and the predicted purchase probability of the behavior commodity is smaller than the probability threshold, multiplying the predicted purchase probability of the behavior commodity by the penalty coefficient to obtain the final predicted purchase probability of the behavior commodity; and sequencing the associated commodities and the behavior commodities according to the predicted purchase probability of the commodities to obtain a commodity recommendation list.
With reference to the first aspect, as a fourth possible implementation manner, the method for recommending commodity information based on historical behaviors of a user further includes step S16: and (4) filtering and outputting the commodity recommendation list obtained in the step (S15) to generate a final commodity recommendation list.
With reference to the fourth possible implementation manner of the first aspect, as a fifth possible implementation manner, the step S16 specifically includes: taking the order commodities within the recent H days of the user, taking the commodity group to which the order commodities belong as a user filtering commodity group, and filtering the commodities belonging to the filtering commodity group in the commodity recommendation list obtained in the S16; and according to the predicted commodity purchasing probability, reordering the commodities in the filtered commodity recommendation list to generate a final commodity recommendation list.
In a second aspect, the present embodiment provides a commodity information recommendation system based on user historical behaviors, including:
an acquisition module: the system is used for collecting historical behavior data of a user on an e-commerce website;
a feature vector establishing module: the system is used for acquiring historical behavior data according to an acquisition module and establishing a user commodity probability prediction characteristic vector;
a model building module; the system is used for predicting the characteristic vector according to the user commodity probability established by the characteristic vector establishing module, training the model and obtaining a user recommended commodity prediction model;
the measuring and calculating module comprises: the system is used for inputting data into a user recommended commodity prediction model and measuring and calculating the predicted purchase probability of behavior commodities;
a first generation module: the system is used for calculating the predicted purchase probability of the associated commodities according to the predicted purchase probability of the behavior commodities measured and calculated by the measuring and calculating module, and combining the behavior commodities and the associated commodities to obtain a commodity recommendation list.
With reference to the second aspect, as a first possible implementation manner, the historical behavior data acquired by the acquisition module is derived from a PC end, a WAP end, an APP end, and offline data.
With reference to the second aspect, as a second possible implementation manner, the feature vector establishing module includes:
washing the submodule: the data processing device is used for cleaning the abnormal data and the data which do not accord with the browsing habit of the user;
the measuring and calculating submodule comprises: the method is used for respectively constructing a user historical behavior characteristic statistical function according to daily statistics on the historical behavior characteristics of each cleaned terminal user, dividing the statistical days into M sections, measuring and calculating the characteristic value of each section according to a time attenuation function on each section, and accumulating the characteristic value of each section to obtain a user characteristic value:
establishing a submodule: the method is used for establishing a user commodity probability prediction feature vector, and the expression form of the user commodity probability prediction feature vector is as follows: fingerprint ID + user ID + commodity ID + user characteristic value of each terminal; the fingerprint ID represents the identification ID of the user, the user ID represents the fingerprint identification of the user, and the commodity ID represents the identification code of the commodity.
With reference to the second aspect, as a third possible implementation manner, the first generating module includes:
determining a submodule: the system comprises a data acquisition module, a data processing module and a data processing module, wherein the data acquisition module is used for acquiring access history data and purchase history data in user history behavior data, calculating the association degree of associated commodities of the behavior commodity by adopting an association rule or a collaborative filtering algorithm, and taking the first b commodities with the highest association degree as an associated commodity set of the behavior commodity;
a calculation submodule: for calculating a purchase probability of the associated item;
a first generation submodule: the method is used for combining the behavior commodities and the associated commodities to generate a commodity recommendation list: if the behavior commodities are in the associated commodity set established by the determining submodule, sequencing according to the predicted purchase probability of the behavior commodities and the associated commodities to obtain a commodity recommendation list; if the behavior commodity is not in the associated commodity set established by the determining submodule and the predicted purchase probability of the behavior commodity is smaller than the probability threshold, multiplying the predicted purchase probability of the behavior commodity by a penalty coefficient to serve as the final predicted purchase probability of the behavior commodity; and sequencing the associated commodities and the behavior commodities according to the predicted purchase probability of the commodities to obtain a commodity recommendation list.
With reference to the second aspect, as a fourth possible implementation manner, the commodity information recommendation system based on the user historical behavior further includes a second generation module: and the system is used for filtering and outputting the commodity recommendation list obtained by the first generation module to generate a final commodity recommendation list.
With reference to the fourth possible implementation manner of the second aspect, as a fifth possible implementation manner, the second generating module includes:
a filtering commodity group establishing submodule: the order filtering system is used for taking order commodities within the last H days of a user and taking a commodity group to which the order commodities belong as a user filtering commodity group;
a filtering submodule: the commodity recommendation module is used for filtering commodities belonging to the filtered commodity group in the commodity recommendation list generated by the first generation module;
a second generation submodule: and according to the predicted commodity purchasing probability, reordering the commodities in the commodity recommendation list filtered by the filtering submodule to generate a final commodity recommendation list.
Has the advantages that: compared with the prior art, the commodity information recommendation method and system based on the user historical behaviors, provided by the embodiment of the invention, can provide personalized commodity recommendation for the user, and the recommendation is more accurate and meets the user requirements. The commodity information recommendation method based on the user historical behaviors is used for analyzing based on the historical behavior data of the user, building a user recommended commodity prediction model, incorporating related commodities related to the behavior commodities into a recommended commodity list, and generating a commodity recommendation list after comprehensively comparing purchase probabilities of the behavior commodities and the related commodities.
Detailed Description
The technical solution of the embodiment of the present invention is explained in detail below with reference to the accompanying drawings.
As shown in fig. 1, a commodity information recommendation method based on user historical behaviors of the embodiment includes the following steps:
s11, collecting historical behavior data of the user on the e-commerce website, wherein the historical behavior data comprises user information and commodity information;
s12, establishing a user commodity probability prediction feature vector according to the attributes and commodity features of the user and the historical behavior features of the user;
s13, training the model according to the user commodity probability prediction feature vector to obtain a user recommended commodity prediction model;
s14, inputting data into the user recommended commodity prediction model to obtain the predicted purchase probability of the behavior commodity;
s15, according to the predicted purchasing probability of the behavior commodity, the predicted purchasing probability of the related commodity is calculated, and the behavior commodity and the related commodity are combined to obtain a commodity recommendation list.
According to the recommendation method, the predicted purchase probability of the behavior commodity is measured and calculated by utilizing the historical behavior data of the user on the e-commerce website, and the behavior commodity and the associated commodity are combined to generate a commodity recommendation list. Because the behaviors of different users are different, the predicted purchase probability of behavior commodities is measured and calculated based on the historical behaviors of different users, so that the finally generated commodity recommendation list has individuation, and different commodity recommendation lists are generated for different users.
In order to make the recommended merchandise list more suitable for the user, in step S11, the historical behavior data is derived from the PC end, the WAP end, the APP end and the offline data. The data source of the multiple terminals is beneficial to expanding the historical behavior data acquisition range of the user, so that the acquired historical behavior data can reflect the historical requirements of the user more accurately, and a more accurate historical data basis is provided for the generation of a subsequent commodity recommendation list. The types of the collected historical behavior data can be determined according to actual needs, and the collected historical behavior data comprise user information and commodity information. For example, user tag information, user access information, user click information, duration of user browsing, user search information, user favorites, shopping cart information, pre-sale information, order sale information, and the like. The user information includes a user identification ID, user attribute information, and the like. The commodity information includes commodity identification codes, commodity characteristic information and the like. The user attribute information includes gender, age, access preference of the user. Wherein the access preferences reflect the user's preferences, such as color, style, etc. The attribute of the user can be obtained by modeling and identifying the attribute by using methods such as statistical analysis, machine learning and the like according to historical behavior data. The commodity characteristics comprise commodity flow characteristics, commodity behavior characteristics and commodity decision cost. The commodity flow characteristics refer to: PV, UV, conversion, sales, number of orders, sales growth rate, order growth rate, and the like. The commodity behavior characteristics are as follows: sales promotion, price reduction, new products, advance booking, money explosion goods, sales promotion force, price and the like. The commodity decision cost means: decision time for purchasing commodities, browsing times, browsing days and the like. The step of constructing the historical behavior characteristics of the user comprises the following steps: analyzing the historical behavior data of the user to obtain factors influencing the purchase of the user, respectively extracting factor characteristic values according to the factors to form factor numerical vectors, and obtaining the historical behavior characteristics of the user.
Preferably, as shown in fig. 2, the establishing of the user commodity probability prediction feature vector in S12 specifically includes:
s201, data cleaning: and cleaning the abnormal data.
Anomalous data is data that is significantly different, anomalous, or inconsistent from other data. For example, the following data to be filtered belong to the abnormal data: filtering users who join the shopping cart with the number of commodity categories > a commodity category threshold value Na; filtering the browsing records of the commodity detail pages with the browsing time being less than the browsing time threshold Nbs; filtering the browsing records of the commodity detail pages with the browsing time being greater than the browsing time threshold Ncs; if the user four-level page browsing number in one session is larger than a threshold Nd of the four-level page browsing number, filtering the session; the user accesses pv on the same day that is less than the pv threshold Ne, filtering the user.
Except for abnormal data, the data which do not accord with the browsing habit of the user can be cleaned, namely: and cleaning the abnormal data and the data which do not accord with the browsing habit of the user. The data which does not conform to the browsing habit of the user refers to data which is greatly different from the behavior of a normal shopping user, such as the browsing behavior of a crawler user or a list-swiping user.
S202, performing feature processing: according to the distribution of the historical behavior characteristics of each terminal user and according to daily statistics, respectively constructing a function shown in the formula (2):
Wherein, f (X) represents a user historical behavior feature statistical function, X represents a feature variable, a represents each feature threshold, and X represents the statistics of the feature variable X.
Setting the statistical days as N days, dividing the statistical days into M sections, and measuring and calculating the characteristic value of each section according to the time attenuation function shown in the formula (3);
Wherein, K represents the half-life of the decay function, t represents the number of days from this measurement, and if the eigenvalue of the previous day is calculated, t is 1, and if the eigenvalue of the previous two days is calculated, t is 2;
and (3) attenuating the characteristic value of each interval according to the formula (3), and accumulating to obtain the characteristic value of the final user:
Wherein N represents the number of statistical days of the historical behavior data, Nt represents 1: a sequence of integers of N/M;
s203, establishing a user commodity probability prediction feature vector: the expression form of the user commodity probability prediction feature vector is as follows: fingerprint ID + commodity ID + user characteristic vector value of each terminal.
The fingerprint ID represents the identification ID of the user. Such as cookie id, MEMI, membership code, etc. The article ID represents an identification code of the article.
The user commodity probability prediction feature vectors are respectively established according to different terminal users, and specifically:
(1) PC user: fingerprint ID (pc) + merchandise ID + behavioral characteristics;
(2) the WAP user: fingerprint ID (wap) + commodity ID + behavioral characteristics;
(3) APP user: fingerprint ID + commodity ID + behavior characteristics;
(4) a cross-screen user: fingerprint ID1(PC) + fingerprint ID2(WAP) + fingerprint ID3+ Commodity ID + behavior characteristics.
Wherein fingerprint ID (PC) represents PC user identification ID; fingerprint ID (WAP) represents WAP user identification ID; the fingerprint ID represents the APP user identification ID.
In step S13, the model is trained based on the user commodity probability prediction feature vector, and a user recommended commodity prediction model is obtained.
The trained model is established according to any one or more methods of logistic regression, lasso regression, random forms. During training, multi-terminal data such as a PC terminal, a WAP terminal and an APP terminal are respectively used for training the model. And taking the user commodity probability prediction feature vector of the commodity converted into the order in the shopping cart as the training set positive sample data. And taking the user commodity probability prediction feature vector of the SKU which is not converted into the order in the behavior as the reverse sample data of the training set. In the model training in this embodiment, a learning classification model is used to calculate the purchase probability of each commodity, including logistic regression, lasso regression, random forms, and the like.
Logistic regression model: under the condition of classification, the learned LR classifier obtains a group of weights, the weights are linearly added with training data to obtain a weighted value, and then the probability of the weighted value is calculated according to the form of a sigmoid function, so that the purchase probability is obtained.
lasso regression model: the Lasso (Least absolute shrinkage and selection operator, Tibshirani) method is a kind of compression estimation. It obtains a more refined model by constructing a penalty function so that it compresses some coefficients while setting some coefficients to zero. The advantage of subset puncturing is thus retained, and is a way to process biased estimates of data with complex collinearity. The basic idea of Lasso is to minimize the sum of the squared residuals under the constraint that the sum of the absolute values of the regression coefficients is less than a constant, thereby enabling the generation of some regression coefficients strictly equal to 0, resulting in an interpretable model. The prediction probability is more accurate.
random forms model: a random forest is a classifier that contains multiple decision trees and whose output classes are dependent on the mode of the class output by the individual trees. And calculating the purchase probability of the user according to the output category.
In step S14, the user data to be predicted is loaded into the user commodity prediction model, and the predicted purchase probability of the behavioral commodity is obtained.
Preferably, S15 specifically includes the following steps:
s301 determines the associated product: and calculating the association degree of the associated commodities of the behavior commodity by adopting an association rule or a collaborative filtering algorithm according to the access history data and the purchase history data in the historical behavior data of the user, taking the first b commodities with the highest association degree as an associated commodity set of the behavior commodity, wherein b is an integer and is greater than 1.
S302, calculating the purchase probability of the associated commodity according to the formula (1):
score _ i (Master _ Pos SKU _ Score _ i/max (SKU _ Score _ i) formula (1)
Wherein Score _ i represents the purchase probability of the associated item; master _ Pos represents the behavior commodity purchase probability; max (SKU _ Score _ i) represents the highest value of the association degree in the associated commodity set, and SKU _ Score _ i represents the association degree of the associated commodity SKU _ i and the behavior commodity;
s303, combining the behavior commodities and the associated commodities to obtain a recommendation list of the commodities:
if the behavior commodities are in the associated commodity set obtained in the step S301, sorting the behavior commodities and the associated commodities according to the predicted purchase probability of the behavior commodities and the associated commodities to obtain a commodity recommendation list; if the behavior commodity is not in the associated commodity set obtained in step S301 and the predicted purchase probability of the behavior commodity is smaller than the probability threshold, the predicted purchase probability of the behavior commodity is multiplied by the penalty coefficient to serve as the final predicted purchase probability of the behavior commodity, and the associated commodity and the behavior commodity are reordered according to the predicted purchase probability of the commodity to obtain a commodity recommendation list.
In step S303, the probability threshold and the penalty coefficient are selected according to the optimal standard of the comprehensive evaluation index (F-Measure), and the probability threshold and the penalty coefficient when the F-Measure is maximum are selected.
Wherein: hit rate is the total number of correctly identified individuals/the total number of identified individuals;
recall-the total number of correctly identified individuals/total number of individuals present in the test set;
under the condition that the hit rate and the recall rate indexes are contradictory, the hit rate and the recall rate indexes are comprehensively considered by adopting a comprehensive evaluation index (F-Measure, also called F-Score) to select an optimal value. F-Measure is a hit and recall weighted harmonic mean.
F-Measure=(1+a2) Hit rate recall rate/a2(hit + recall);
when the parameter a is 1, it is the most common F1, i.e., F1 is 2 hits recall/(hits + recall).
It can be seen that F1 combines the results of hit rate and recall rate. When F1 is higher, the method is more effective. F1 has the main function of adjusting the sequence.
As shown in fig. 4, the recommendation method provided in this embodiment adds step S16 on the basis of the foregoing embodiment: and (4) according to the commodity recommendation list obtained in the step (S15), filtering and outputting logic processing are carried out according to behavior filtering logic, and a final commodity recommendation list is output.
The specific process of filtering and outputting logic processing according to the behavior filtering logic is as follows: taking the order commodities within the recent H days of the user, taking the commodity group to which the order commodities belong as a user filtering commodity group, and filtering the commodities belonging to the filtering commodity group in the commodity recommendation list obtained in the S16; and according to the predicted purchase probability of the commodities, reordering the commodities in the filtered commodity recommendation list to obtain a final commodity recommendation list. Based on the commodities for which the user placed orders within the last H days, the user cannot buy the commodities again in the near future, so that the commodities for which the orders have been placed within the last H days do not appear in the final commodity recommendation list by adopting behavior filtering logic processing. The commodity recommendation list after the commodities are eliminated can reflect the requirements of the user more accurately.
The recommendation method of the embodiment comprehensively considers the user behavior and the commodity characteristics and the cross characteristics of the user behavior and the commodity characteristics, improves the prediction accuracy and further improves the recommendation accuracy. Cross-feature refers to a linear or non-linear combination of feature attributes. The cross features are richer in depiction of user behaviors, and the dimensionality of the feature variables is increased, so that the accuracy of the model is further increased.
And (3) carrying out precision testing: in the comparative example and the embodiment, the test data of the comparative example and the test data of the embodiment are obtained by calculation in a new time window by adopting the acquisition mode of the training data in the embodiment. The comparative example used a logistic regression model, and the present example used the model established in step S13. During calculation, the user characteristics adopted by the comparison example are browsing behaviors of the user, and the user characteristics adopted by the embodiment are user behavior characteristics and commodity characteristics. The comparative example and the present example output recommendation lists through model tests. According to the prediction results of the two, the prediction accuracy AUC of the comparative example is 0.70, and the prediction accuracy AUC of the present example is 0.83.
The recommendation results are sorted by comprehensive dimensionality, aiming at the condition that the recall rate and the hit rate of the recommended core evaluation index are inconsistent, the embodiment adopts a statistical method of comprehensive weighting harmonic mean to measure, and finally optimizes the recommendation sorting result by using the optimal value, namely, sorting is carried out according to the result corresponding to the maximum value of the comprehensive evaluation index, so that the accuracy of recommendation sorting is improved,
and a multi-level attenuation method is adopted to establish the historical behavior characteristics of the user. The historical behavior of the user is divided into M segments, and attenuation is performed in two dimensions of the segments and time. The method is used for reserving the continuous browsing habit of the user, considers that the user behaviors in the same section are continuous behaviors, and considers the influence of time on the purchase demand of the user. The multi-level attenuation method affects the sorted commodity scores, and the attenuation speed and interval cause different final characteristic vector values, so that the user scores are different. Since the commodity ranking of the user is arranged according to the size of the score, the different scores affect the ranking result (i.e., the recommendation ranking result).
The method predicts the purchase probability of the electronic commerce commodity by the user, and the prediction result is used as basic prediction data for accurate marketing, personalized recommendation and the like of the electronic commerce website.
In addition, as shown in fig. 5, there is also provided a commodity information recommendation system based on a user's historical behavior, the system including:
an acquisition module: the system is used for collecting historical behavior data of a user on an e-commerce website;
a feature vector establishing module: the system is used for acquiring historical behavior data according to an acquisition module and establishing a user commodity probability prediction characteristic vector;
a model building module; the system is used for predicting the characteristic vector according to the user commodity probability established by the characteristic vector establishing module, training the model and obtaining a user recommended commodity prediction model;
the measuring and calculating module comprises: the system is used for inputting data into a user recommended commodity prediction model and measuring and calculating the predicted purchase probability of behavior commodities;
a first generation module: the system is used for calculating the predicted purchase probability of the associated commodities according to the predicted purchase probability of the behavior commodities measured and calculated by the measuring and calculating module, and combining the behavior commodities and the associated commodities to obtain a commodity recommendation list.
In the system, the predicted purchase probability of the behavior commodity is measured and calculated by utilizing the historical behavior data of the user on the e-commerce website, and the behavior commodity and the associated commodity are combined to generate a commodity recommendation list. Because the behaviors of different users are different, the predicted purchase probability of behavior commodities is measured and calculated based on the historical behaviors of different users, so that the finally generated commodity recommendation list has individuation, and different commodity recommendation lists are generated for different users.
The historical behavior data collected by the collection module is from the PC end, the WAP end, the APP end and the offline data. The data source of the multiple terminals is beneficial to expanding the historical behavior data acquisition range of the user, so that the acquired historical behavior data can reflect the historical requirements of the user more accurately, and a more accurate historical data basis is provided for the generation of a subsequent commodity recommendation list. The historical behavior data includes user information and commodity information. The user information includes an identification ID of the user, user attribute information, and the like. The commodity information includes an identification code of the commodity, commodity characteristic information, and the like. The user attributes include gender, age, access preferences of the user. The commodity characteristics comprise commodity flow characteristics, commodity behavior characteristics and commodity decision cost.
As a preferred solution, as shown in fig. 6, the feature vector establishing module includes:
washing the submodule: the data processing device is used for cleaning the abnormal data and the data which do not accord with the browsing habit of the user;
the measuring and calculating submodule comprises: the method is used for respectively constructing a user historical behavior characteristic statistical function according to daily statistics on the historical behavior characteristics of each cleaned terminal user, dividing the statistical days into M sections, measuring and calculating the characteristic value of each section according to a time attenuation function on each section, and accumulating the characteristic value of each section to obtain a user characteristic value:
establishing a submodule: the method is used for establishing a user commodity probability prediction feature vector, and the expression form of the user commodity probability prediction feature vector is as follows: fingerprint ID + + commodity ID + user characteristic value of each terminal; the fingerprint ID represents the identification ID of the user, and the commodity ID represents the identification code of the commodity.
In the characteristic vector establishing module, a cleaning submodule is used for cleaning abnormal data and data which do not conform to the user browsing habit, then a measuring submodule is used for measuring and calculating a user characteristic value, and finally an establishing submodule is used for establishing a user commodity probability prediction characteristic vector. The measuring and calculating submodule measures and calculates the characteristic value of each interval according to the time attenuation function, and then accumulates the characteristic value of each interval. The multi-level attenuation method affects the sorted commodity scores, the final characteristic vector values are different due to the attenuation speed and the attenuation interval, so that the user scores are different, and the different scores affect the sorting result (namely, the recommended sorting result) because the commodity sorting of the user is arranged according to the scores.
In the cleansing submodule, the abnormal data refers to data that is significantly different, abnormal, or inconsistent from other data. For example, the following data to be filtered belong to the abnormal data: filtering users who join the shopping cart with the number of commodity categories > a commodity category threshold value Na; filtering the browsing records of the commodity detail pages with the browsing time being less than the browsing time threshold Nbs; filtering the browsing records of the commodity detail pages with the browsing time being greater than the browsing time threshold Ncs; if the user four-level page browsing number in one session is larger than a threshold Nd of the four-level page browsing number, filtering the session; the user accesses pv on the same day that is less than the pv threshold Ne, filtering the user.
The data which does not conform to the browsing habit of the user refers to data which is greatly different from the behavior of a normal shopping user, such as the browsing behavior of a crawler user or a list-swiping user.
Preferably, as shown in fig. 7, the first generating module includes:
determining a submodule: the system comprises a data acquisition module, a data processing module and a data processing module, wherein the data acquisition module is used for acquiring access history data and purchase history data in user history behavior data, calculating the association degree of associated commodities of the behavior commodity by adopting an association rule or a collaborative filtering algorithm, and taking the first b commodities with the highest association degree as an associated commodity set of the behavior commodity;
a calculation submodule: for calculating a purchase probability of the associated item according to equation (1):
score _ i (Master _ Pos SKU _ Score _ i/max (SKU _ Score _ i) formula (1)
Wherein Score _ i represents the purchase probability of the associated item; master _ Pos represents the behavior commodity purchase probability; max (SKU _ Score _ i) represents the highest value of the association degree in the associated commodity set, and SKU _ Score _ i represents the association degree of the associated commodity SKU _ i and the behavior commodity;
a first generation submodule: the method is used for combining the behavior commodities and the associated commodities to generate a commodity recommendation list: if the behavior commodities are in the associated commodity set established by the determining submodule, sequencing according to the predicted purchase probability of the behavior commodities and the associated commodities to obtain a commodity recommendation list; if the behavior commodity is not in the associated commodity set established by the determining submodule and the predicted purchase probability of the behavior commodity is smaller than the probability threshold, multiplying the predicted purchase probability of the behavior commodity by a penalty coefficient to serve as the final predicted purchase probability of the behavior commodity; and sequencing the associated commodities and the behavior commodities according to the predicted purchase probability of the commodities to obtain a commodity recommendation list.
In the first generation submodule, a probability threshold and a penalty coefficient are selected according to the optimal standard of a comprehensive evaluation index (F-Measure), and the probability threshold and the penalty coefficient when the F-Measure is maximum are selected.
The first generation submodule considers not only the behavior commodity but also the associated commodity, and takes the associated commodity and the behavior commodity together as the commodity to be recommended. When the recommended commodities are selected, the predicted purchase probability of the behavior commodities is processed differently according to whether the behavior commodities exist in the associated commodity set or not, and the associated commodities and the processed behavior commodities are sorted again according to the purchase probability, so that the positions of the behavior commodities in the recommended list are more in line with the requirements of the user.
As shown in fig. 8, the commodity information recommendation system based on the user historical behavior further includes a second generation module: and the system is used for filtering and outputting the commodity recommendation list obtained by the first generation module to generate a final commodity recommendation list. Since the commodities recently purchased by the user are generally not purchased again, the commodity recommendation list generated by the first generation module is filtered and output logically, so that no commodity recently purchased by the user exists in the final commodity recommendation list, and the commodity recommendation list is more in line with the real requirement of the user.
As shown in fig. 9, the second generating module includes:
a filtering commodity group establishing submodule: the method is used for taking the order commodities within the last H days of the user and taking the commodity group to which the order commodities belong as a user filtering commodity group. H is an integer, and H > 3.
A filtering submodule: the commodity filtering module is used for filtering commodities belonging to the filtered commodity group in the commodity recommendation list generated by the first generation module.
A second generation submodule: and according to the predicted commodity purchasing probability, reordering the commodities in the commodity recommendation list filtered by the filtering submodule to generate a final commodity recommendation list.
And selecting the filtering commodity group through the filtering commodity group establishing submodule. The filtered group of items are items recently purchased by the user. And the filtering submodule filters the commodities belonging to the filtered commodity group in the commodity recommendation list generated by the first generation module. And the second generation submodule reorders the commodities in the filtered commodity recommendation list according to the predicted commodity purchasing probability to generate a final commodity recommendation list. Through the three sub-modules, commodities which belong to the same type as the commodities recently purchased by the user in the commodity recommendation list generated by the first generation module are filtered out, so that the commodities arranged in the final commodity recommendation list meet the real requirements of the user.
Those skilled in the art will appreciate that the methods or systems for implementing the embodiments described above can be implemented via computer program instructions. The computer program instructions are loaded onto a programmable data processing apparatus, such as a computer, to cause corresponding instructions to be executed on the programmable data processing apparatus to implement the functions of the method or system of the above-described embodiments.
Those skilled in the art can make non-inventive technical improvements to the present application based on the above-described embodiments without departing from the spirit of the present invention. Such modifications are to be considered within the scope of the claims of the present application.