CN108629665B

CN108629665B - Personalized commodity recommendation method and system

Info

Publication number: CN108629665B
Application number: CN201810433175.XA
Authority: CN
Inventors: 张洪刚; 孙宇; 常剑; 徐彬; 高珊
Original assignee: Beijing University of Posts and Telecommunications; China Unicom Online Information Technology Co Ltd
Current assignee: Beijing University of Posts and Telecommunications; China Unicom Online Information Technology Co Ltd
Priority date: 2018-05-08
Filing date: 2018-05-08
Publication date: 2021-07-16
Anticipated expiration: 2038-05-08
Also published as: CN108629665A

Abstract

The invention discloses a personalized commodity recommendation method and a system, wherein the method comprises the following steps: acquiring historical behavior data of a plurality of users in a preset time period, and sorting the historical behavior data according to a preset rule to obtain a first training sample; obtaining an influence factor based on a cosine similarity method to serve as a second training sample; training the first training sample and the second training sample as training samples of the deep learning model to obtain a trained deep learning model; and outputting the list of the commodities predicted by the model to be interested by the user. According to the invention, the time sequence information of the commodities in the historical behaviors of the user is effectively utilized, so that the commodities in the historical behaviors have different weighted values in the calculation of the recommendation system according to the time sequence of the occurrence of the interactive behaviors, the commodity influence factors reflect the global characteristics of the commodities and the interest degree of the user on the commodities, the characteristic quantity obtained by the deep learning model is effectively increased, and the personalized recommendation effect on the cold-start user is effectively improved.

Description

Personalized commodity recommendation method and system

Technical Field

The invention relates to a personalized commodity recommendation method and system, and belongs to the technical field of recommendation.

Background

With the popularization of the internet and the rapid development of electronic commerce, more and more users browse and purchase goods through an electronic commerce platform. The variety of commodities in the e-commerce platform is various, and it is difficult to accurately recommend commodities which are interested by a user to the user without the help of a personalized recommendation engine.

One of the mainstream methods of the personalized recommendation method adopted by the current e-commerce platform is as follows: the collaborative filtering method recommends the commodities, which are similar to the historical browsing or purchasing behaviors of the current user, of the users interested in the behaviors to the current user by searching for the users similar to the historical browsing or purchasing behaviors of the current user. The personalized recommendation method based on collaborative filtering gives the same weight to all the commodities browsed or purchased by the current user in history when calculating the similarity with other commodities, for example, a commodity A browsed by the current user ten days ago, a commodity B browsed by the current user one hour ago, and the roles of the commodity A and the commodity B in the collaborative filtering method for calculating the commodities which are interested by the current user are the same. However, from experience, the reference value of the current personalized commodity recommendation of the commodity B browsed before one hour is often higher than that of the commodity A browsed before ten days, so that the personalized recommendation method based on collaborative filtering cannot reflect the weight difference of commodities in different sequences in historical behaviors in the current personalized recommendation, and the recommendation effect is not accurate enough. In addition, the effect of the collaborative filtering algorithm is not good when the user has too little historical browsing or purchasing data, i.e. there is a cold start problem.

The second main method is: the method based on the recurrent neural network trains and predicts the commodities which the user is interested in by sequentially inputting the historical behaviors of the user into the recurrent neural network. The method can utilize the commodity time sequence characteristics in the user historical behaviors, so that the effect is generally superior to that of a collaborative filtering algorithm under the conditions of sufficient training samples and proper adjustment of recurrent neural network parameters. Under the condition that the historical behavior data of a user is less, the personalized recommendation method based on the recurrent neural network has the cold start problem because the historical browsing or commodity purchasing data which can be used for predicting the interestingness of the recurrent network is too less.

Both of the above methods have a cold start problem, that is, when the available user history operation records are less, the commodity cannot be recommended well, and the user experience is affected. The cold start problem is a common problem in the field of personalized recommendation, and since the grasped historical behavior data of the user to be recommended currently is too little, it is difficult to accurately provide the user with commodities which the user may be interested in.

One of the main solutions to the cold start problem in the existing solutions: the method based on the user information recommends according to basic information such as the age, sex, residence and the like of the user. However, since some users may not fill in detailed personal information, the user information that can be acquired by the recommendation system is limited. And the recommendation made according to the information of the gender, the age and the like of the user only represents the general interest of the user group, and is not personalized recommendation for the user.

The second main solution to the cold start problem in the existing solution is: according to the label selection method, various classified labels are provided for the user to select when the user registers for use for the first time, and the commodities under the label classification selected by the user are recommended to the user under the condition that user behavior data are insufficient. The problem with the tag selection based approach is: some users may not fill in category labels at the beginning of use, and may choose to "skip this step" or choose at will, resulting in the selected labels not really representing the user's interests. And the goods that the user is interested in may change with time, so there is great difference between the interest label of original choice and the goods of current interest, for example, a certain user may be "sun cream" in summer, change to "down coat" in winter, if recommend according to the originally selected label of user, will produce very big deviation. Furthermore, due to the variety of goods, the label cannot completely cover all goods features.

Disclosure of Invention

Aiming at the defects, the invention provides a personalized commodity recommendation method and a system, after historical behavior data of a user is obtained, the data are analyzed and sorted, influence factors of each commodity corresponding to the current user are obtained by calculation according to historical behavior information of the current user and global characteristics corresponding to the commodities in the historical behaviors, the commodities in the historical behaviors of the user and the influence factors corresponding to the commodities are sequentially input into a preset deep learning model according to the time sequence of interaction between the user and the commodities to obtain predicted commodities which are interested by the current user, the commodities are ranked according to the predicted interest degrees, and a plurality of commodities with the highest interest degrees of the user are recommended to the user according to the ranked sequence, wherein the deep learning model adopts a recurrent neural network model.

The invention can effectively utilize the time sequence information of the commodities in the historical behaviors of the user, so that the commodities in the historical behaviors have different weighted values according to the time sequence of the occurrence of the interactive behaviors in the calculation of the recommendation system.

In order to achieve the purpose, the invention is concretely realized by the following technical scheme:

the invention provides a personalized commodity recommendation method, which comprises the following steps:

acquiring historical behavior data of a plurality of users in a preset time period, and sorting the historical behavior data according to a preset rule to obtain a first training sample;

calculating an influence factor corresponding to each commodity in the historical behavior data of each user based on a cosine similarity method by using the sorted historical behavior data as a second training sample;

training the first training sample and the second training sample as training samples of the deep learning model to obtain a trained deep learning model;

and inputting a first training sample and a second training sample of the pre-recommended user into the trained deep learning model, outputting a commodity list predicted by the model and interested by the user, and recommending the commodity list to the pre-recommended user according to the commodity sequence in the commodity list.

Further, the step of obtaining the first training sample after the arrangement according to the predetermined rule includes:

screening out interactive behavior data of a specific category based on the acquired historical behavior data of a plurality of users in a preset time period;

converting the information in the interactive behavior data of the specific category into a unique number form for storage;

and sequencing the interactive behavior data of each user in the interactive behavior data of the specific category according to the time sequence.

Further, the specific step of calculating the influence factor corresponding to each commodity in the historical behavior data of each user based on the cosine similarity method includes:

calculating to obtain N-dimensional vector of specific commodity based on cosine similarity methodThe vector is subjected to maximum value normalization, so that each value in the vector is in [0,1]]In the method, the N-dimensional vector is an influence factor corresponding to the commodity, N is the total quantity of the commodity in the preprocessed historical behavior data or the quantity of candidate commodities screened in advance by certain rules from the preprocessed historical behavior data, and the Value of the ith number in the influence factor represented by the N-dimensional vector corresponding to the commodity_iThe specific calculation method comprises the following steps:

the method comprises the steps that Count (specific commodity & ith commodity) represents the times of the specific commodity and the ith commodity appearing in preprocessed historical behavior data together, i is larger than or equal to 1 and smaller than or equal to N, Count (specific commodity) represents the total interaction times of the specific commodity in the historical behavior data sorted according to a preset rule, Count (ith commodity) represents the total interaction times of the ith commodity in the historical behavior data sorted according to the preset rule, and Max (values) represents the maximum value of influence factors corresponding to all commodities in the historical behavior data of each user.

Further, the deep learning model includes, but is not limited to, a recurrent neural network model.

Further, the recommending to the pre-recommending user according to the commodity sequence in the commodity list includes:

recommending a plurality of commodities ranked in the front to the pre-recommending user according to the sequence of the interest degrees of the commodities predicted in the commodity list from high to low.

The invention also provides a personalized commodity recommendation system, which comprises:

the acquisition module is used for acquiring historical behavior data of a plurality of users in a preset time period, and obtaining a first training sample after the historical behavior data is arranged according to a preset rule;

the calculation module is used for calculating an influence factor corresponding to each commodity in the historical behavior data of each user based on a cosine similarity method by using the sorted historical behavior data as a second training sample;

the training module is used for training the first training sample and the second training sample as training samples of the deep learning model to obtain a trained deep learning model;

and the recommending module is used for inputting the first training sample and the second training sample of the pre-recommending user into the trained deep learning model, outputting a commodity list predicted by the model and interested by the user, and recommending the commodity list to the pre-recommending user according to the commodity sequence in the commodity list.

The acquisition module includes:

the screening unit is used for screening out interactive behavior data of a specific category based on the acquired historical behavior data of a plurality of users in a preset time period;

the conversion unit is used for converting the information in the interactive behavior data of the specific category into a unique number form for storage;

and the sequencing unit is used for sequencing the interactive behavior data of each user in the interactive behavior data of the specific category according to the time sequence.

The calculation module comprises:

a calculating unit for calculating N-dimensional vector of specific commodity based on cosine similarity method and normalizing the maximum value of the vector to make each value in the vector be [0, 1%]In the method, the N-dimensional vector is an influence factor corresponding to the commodity, N is the total quantity of the commodity in the preprocessed historical behavior data or the quantity of candidate commodities screened in advance by certain rules from the preprocessed historical behavior data, and the Value of the ith number in the influence factor represented by the N-dimensional vector corresponding to the commodity_iThe specific calculation method comprises the following steps:

The deep learning model includes, but is not limited to, a recurrent neural network model.

The recommending to the pre-recommending user according to the commodity sequence in the commodity list comprises the following steps:

The invention has the beneficial effects that:

by the technical scheme provided by the invention, the user historical behavior data containing the time sequence information can be processed, and due to the fact that the correlation between the commodities interacted at different times and the commodities interested by the current user is different, the time sequence information contained in the user historical behavior is extracted and utilized in the deep learning model, and commodity personalized recommendation can be more accurately carried out on the user. The invention creatively provides commodity influence factors, global characteristics of each commodity are found from historical behaviors of a plurality of users, influence factors corresponding to the commodities are calculated, and commodity vector expression forms in the preprocessed historical behaviors of the users and vector expression forms of the corresponding influence factors are jointly input into the deep learning model.

Drawings

Fig. 1 is a flowchart illustrating an embodiment of a method for recommending personalized goods according to the present invention.

Detailed Description

The technical solutions of the present invention are specifically described below, it should be noted that the technical solutions of the present invention are not limited to the embodiments described in the examples, and those skilled in the art should refer to and refer to the contents of the technical solutions of the present invention, and make improvements and designs on the basis of the present invention, and shall fall into the protection scope of the present invention.

Example one

The embodiment of the invention provides a personalized commodity recommendation method, which comprises the following steps of S110-S140:

in step S110, historical behavior data of a plurality of users within a preset time period is obtained, and a first training sample is obtained after the historical behavior data is collated according to a predetermined rule.

The specific value of the preset time period and the number of extracted users can be set according to actual conditions, for example, if the time period is set to be one month and the number of extracted users is Num, historical behavior data of Num users randomly selected in one month are extracted from all current data. Since personalized recommendation is usually time-efficient, for example, a user browses 'down jackets' half a year ago and browses 'one-piece dress' recently, if browsing data of the user half a year ago is still considered, the recommendation effect may be the right of the way, and therefore specific values of the preset time period need to be set according to actual application conditions.

Further, the step of obtaining the first training sample after the arrangement according to the predetermined rule includes: screening out interactive behavior data of a specific category based on the acquired historical behavior data of a plurality of users in a preset time period; converting the information in the interactive behavior data of the specific category into a unique number form for storage; and sequencing the interactive behavior data of each user in the interactive behavior data of the specific category according to the time sequence.

The step of sorting according to a preset rule is a preprocessing process of the invention, and data sorting is carried out according to historical behavior data extracted in a preset time period. Because the specific implementation scenes are different, the behavior data types may be different, for example, the behavior data may be in various forms such as browsing, paying attention to, listening to, watching and the like, so that the behavior data is generated by using "interaction" to refer to the user and the commodity. The preprocessing process specifically comprises the steps of converting information in the interactive behavior data of a specific category into a unique number form for storage; and sequencing the interactive behavior data of each user in the interactive behavior data of the specific category according to the time sequence, and the like.

And screening out interactive behavior data of a specific category based on the acquired historical behavior data of a plurality of users in a preset time period, namely selecting behavior data of the specific category from all the interactive behavior data of the users according to the requirements of specific applications. For example, behavior data of a user in a shopping website can include behaviors of browsing, adding in a favorite, adding in a shopping cart, purchasing and the like, if personalized recommendation is planned only by using the browsing data of the user, browsing commodity data of the user is extracted, and interactive behaviors of the user in adding in the favorite and the like are ignored.

The information in the interaction behavior data of the specific category is converted into a unique number form to be stored, that is, the specific information is converted into a number form to be replaced, specifically, the user is replaced in the number form, the commodity is replaced in the number form, and the numbers of the same commodities interacted by different users are the same. And then, the commodity numbers are converted into vector representation, and each commodity has a unique vector representation form and is convenient to be used as an input of a deep learning model. There are many ways to convert the commodity numbers into vector representation, as long as it is ensured that each converted commodity has a unique vector representation form and is not confused with other commodities. For example, one method for converting the commodity number into a vector representation may be: and N is the total number of commodities in the preprocessed historical behavior data or the number of candidate commodities screened from the preprocessed historical behavior data through some rules in advance, an N-dimensional vector is constructed, wherein the commodity A is represented as [1,0,0, …,0,0], namely the other N-1 bits of the vector are 0 except for the first bit of 1, and the commodity B can be represented as [0,1,0, …,0,0], namely the other N-1 bits of the vector are 0 except for the second bit of 1, and the N commodities in the commodity library are converted into N vector forms for representation.

And sequencing the interactive behavior data of each user in the interactive behavior data of the specific category according to the time sequence, namely sequencing the commodities of the same user after screening out the specific interactive behaviors according to the time sequence of the interactive behaviors, wherein the commodity which is interacted firstly is arranged in front of the commodity which is interacted secondly.

In step S120, using the sorted historical behavior data, an influence factor corresponding to each commodity in the historical behavior data of each user is calculated based on a cosine similarity method to serve as a second training sample.

calculating to obtain an N-dimensional vector of a specific commodity based on a cosine similarity method, and then carrying out maximum value normalization on the vector to ensure that each numerical value in the vector is [0,1]]In the method, the N-dimensional vector is an influence factor corresponding to the commodity, N is the total quantity of the commodity in the preprocessed historical behavior data or the quantity of candidate commodities screened in advance by certain rules from the preprocessed historical behavior data, and the Value of the ith number in the influence factor represented by the N-dimensional vector corresponding to the commodity_iThe specific calculation method comprises the following steps:

the method comprises the steps that Count (specific commodity & ith commodity) represents the times of the specific commodity and the ith commodity appearing in preprocessed historical behavior data together, i is larger than or equal to 1 and smaller than or equal to N, Count (specific commodity) represents the total interaction times of the specific commodity in the historical behavior data sorted according to a preset rule, Count (ith commodity) represents the total interaction times of the ith commodity in the historical behavior data sorted according to the preset rule, and Max (values) represents the maximum value of influence factors corresponding to all commodities in the historical behavior data of each user. The N-dimensional vector of the specific commodity is obtained after the maximum value normalization is calculated by the method, each numerical value in the vector is between [0 and 1], and the N-dimensional vector is the influence factor corresponding to the specific commodity. The larger the value of the ith number in the vector is, the greater the correlation between the specific product and the product corresponding to the ith number is.

And judging the heat degree of the commodity, the times of the two commodities being interested by the same user and the like according to the times of the commodities appearing in the historical behavior data by utilizing the preprocessed historical behavior data. Generally, the commodities interacted by more users are more likely to cause the current user interaction, and the fact that two commodities are frequently and simultaneously appeared in the historical behavior data of the same user can indicate that the two commodities are relatively strong in relevance, and if the current user interacts with one commodity, the other commodity is interested with a relatively high probability.

In step S130, the first training sample and the second training sample are trained as training samples of the deep learning model, so as to obtain a trained deep learning model.

And (4) inputting the historical behavior data obtained after the arrangement according to the preset rules in the step (S110) and the influence factors of the commodities obtained in the step (S120) as training samples into the deep learning model for training. The deep learning model adopted by the method is a recurrent neural network model which can be an RNN model and an improved model thereof, such as LSTM and the like.

And randomly initializing each parameter in the deep learning model, and then sequentially inputting a vector expression corresponding to the interactive commodity and a vector expression of an influence factor into the recurrent neural network model for each user according to the interaction occurrence time sequence of the interactive commodity corresponding to the user according to the user number in the preprocessed historical behavior data to train the neural network.

For the recurrent neural network model, the ith input is the vector expression form of the ith commodity interacted by the user and the vector expression form of the influence factor thereof in the preprocessed behavior data, at this time, the model obtains a predicted output, the output is compared with the (i + 1) th commodity interacted by the user in the preprocessed behavior data, the deviation of the recurrent neural network is calculated, and the parameter of the neural network model is continuously corrected according to the deviation. After the vector expression forms of all the commodities interacted by one user in the preprocessed behavior data and the vector expression forms of the influence factors of the commodities are input into the deep learning model in sequence for training, the vector expression form of the interactive commodity of the next user in the preprocessed behavior data and the vector expression form of the influence factors of the interactive commodity are input into the trained recurrent neural network model in sequence according to time, and training is continued until the vector expression forms of the interactive commodities of all the users and the vector expression forms of the influence factors of the interactive commodities in the preprocessed behavior data are all input into the deep learning model for training. At this time, a trained deep learning model is obtained. Because the recurrent neural network model is generally used in the field of deep learning, the specific construction method of the recurrent neural network model is not repeated.

The model can obtain the next predicted commodity by sequentially inputting the historical behavior data preprocessed by the user to be subjected to personalized recommendation into the model, and the specific output of the next predicted commodity is the predicted scores of all commodities in the selected commodity library and is arranged from high to low according to the scores. The selected commodity library may be all commodities in a platform to which the recommendation method is applied, or a candidate commodity library that is pre-screened according to some principles. For example, if the shopping platform wants to recommend new commodities to the user and selects a plurality of commodities from 2000 new online commodities to recommend to the user, the candidate commodity library at this time contains the 2000 new online commodities.

In step S140, the first training sample and the second training sample of the pre-recommended user are input into the trained deep learning model, and the commodity list predicted by the model to be interested by the user is output and recommended to the pre-recommended user according to the commodity sequence in the commodity list.

When the pre-recommended user or the current user is recommended in an individualized manner, the historical behavior data of the pre-recommended user or the current user in the selected time period is acquired, and the historical behavior data of the pre-recommended user or the current user sorted according to the predetermined rule and the influence factors corresponding to the commodities, which are acquired in the step S110 and the step S120, are input into the deep learning model trained in the step S130 for calculation. And the trained deep learning model obtains the scores of the commodities in the predicted commodity library, the commodities are ranked from high to low according to the scores, the commodities are a commodity list interested by the current user, and a plurality of commodities ranked in the front are recommended to the current user according to the sequence of the predicted interest degree from high to low. The higher the predicted interest level, the more forward the merchandise is ranked.

If only the commodities in the candidate commodity library, for example 2000 new online commodities, are recommended to the user, the prediction scores of the 2000 commodities are only calculated and ranked. And then selecting a plurality of commodities ranked in the front from the commodity list in which the current user is interested to recommend the commodities to the current user according to the specific commodity quantity recommended to the current user as required, and completing the personalized recommendation to the current user.

The commodities are not limited to physical commodities such as clothes, daily necessities and the like, and for a multimedia platform, specific music and movies can be regarded as commodities under the conditions that the music is recommended to a user in a personalized manner according to a song listening list of the user or the interactive information between other users and a music platform, or the movies are recommended to the user in a personalized manner according to the movie watching of the user or the other interactive information between the user and the movies.

The method can collect the interactive behaviors of a plurality of users in a specific time period, and can be used for carrying out personalized recommendation by predicting the commodities interested by the users by using an improved recurrent neural network model according to the interactive behaviors of the users on an Internet platform and containing time sequence information. The improved recurrent neural network model may be the number of neurons of the network model, the number of layers of the network model, the addition threshold function and the like, and if the recurrent neural network model with the improved structure does not directly influence the manner of adding the commodity time sequence information and the influence factors into the recurrent neural network model, which is provided by the method, the method can be regarded as one of the implementation methods of the method.

The deep learning model can be a recurrent neural network model or a model obtained by improving the recurrent neural network model, such as an LSTM model, and if the deep learning model is a model obtained by recursively ordering the commodities according to a certain rule, and sequentially inputting the corresponding features of the current commodity, and then outputting the corresponding deep learning which is the features of the next commodity, the deep learning model can be regarded as the recurrent neural network model and the improved model thereof.

The interactive behavior can be one or a combination of behaviors of browsing commodities, watching or listening, adding to favorites, adding to shopping carts, purchasing and the like.

If only the coefficient of the influence factor calculation method is changed, the method still belongs to the protection scope of the method.

Example two

The second embodiment of the present invention further provides a personalized commodity recommendation system, including:

The acquisition module includes:

The calculation module comprises:

a calculating unit for calculating N-dimensional vector of specific commodity based on cosine similarity method and normalizing the maximum value of the vector to make each value in the vector be [0, 1%]In between, the N-dimensional vector is the influence factor corresponding to the commodity, N is preThe total number of commodities in the processed historical behavior data or the number of candidate commodities screened in advance from the preprocessed historical behavior data through some rules, and the Value of the ith number in the influence factors expressed by the N-dimensional vectors corresponding to the commodities_iThe specific calculation method comprises the following steps:

The specific implementation functions and processing modes refer to specific steps described in the first embodiment of the method.

Since the processing and functions implemented by the system of the second embodiment substantially correspond to the embodiment, the principle and the example of the method shown in fig. 1, the description of the embodiment is not detailed, and reference may be made to the related description in the foregoing embodiment, which is not described herein again.

The invention has the beneficial effects that:

Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by hardware, or by software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments of the present invention.

The above disclosure is only for a few specific embodiments of the present invention, however, the present invention is not limited to the above embodiments, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present invention.

Claims

1. A method for recommending personalized goods, the method comprising:

acquiring historical behavior data of a plurality of users in a preset time period, and sorting the behavior data obtained after sorting according to a preset rule according to a time sequence to obtain a first training sample;

inputting a first training sample and a second training sample of a pre-recommended user into a trained deep learning model, outputting a commodity list predicted by the model and interested by the user, and recommending the commodity list to the pre-recommended user according to the commodity sequence in the commodity list;

the specific steps of calculating the influence factor corresponding to each commodity in the historical behavior data of each user based on the cosine similarity method comprise:

2. The method of claim 1, wherein the step of obtaining the first training sample after the sorting according to the predetermined rule comprises:

3. The method of claim 1, in which the deep learning model comprises a recurrent neural network model.

4. The method of claim 1, wherein the recommending to the pre-recommending user in the order of the items in the item list comprises:

5. A personalized goods recommendation system, comprising:

the acquisition module is used for acquiring historical behavior data of a plurality of users in a preset time period, and sequencing the behavior data obtained after the behavior data is sorted according to a preset rule in a time sequence to obtain a first training sample;

the recommendation module is used for inputting a first training sample and a second training sample of a pre-recommended user into the trained deep learning model, outputting a commodity list predicted by the model and interested by the user, and recommending the commodity list to the pre-recommended user according to the commodity sequence in the commodity list;

the calculation module comprises:

6. The system of claim 5, wherein the acquisition module comprises:

7. The system of claim 5, in which the deep learning model comprises a recurrent neural network model.

8. The system of claim 5, wherein the recommending to the pre-recommending user in the order of the items in the item list comprises: