CN113988969A

CN113988969A - Collaborative filtering recommendation method based on RFM model

Info

Publication number: CN113988969A
Application number: CN202111196339.XA
Authority: CN
Inventors: 刘鹏飞; 于德尚; 杨尚伟; 郑鑫
Original assignee: Qingdao Mengdou Network Technology Co ltd
Current assignee: Qingdao Mengdou Network Technology Co ltd
Priority date: 2021-10-14
Filing date: 2021-10-14
Publication date: 2022-01-28

Abstract

The invention discloses a collaborative filtering recommendation method based on an RFM model, which comprises the following steps: firstly, classifying all commodities under an e-commerce platform into N classes; secondly, calculating the values of a similarity attribute vector, a frequency attribute vector and a value attribute vector of the user to be recommended; calculating the similarity of the user to be recommended to other enterprise users under three measuring indexes of closeness, frequency and value, and further calculating the comprehensive similarity; and then, sequencing the comprehensive similarity of the user to be recommended and other enterprise users to obtain an enterprise list with high similarity as a similar user, and recommending the enterprise list to the user according to a final product recommendation list formed by integrating two product recommendation lists formed by combining the product purchase records of the user to be recommended according to the product purchase records of the similar user. The method can solve the problem of recommending products, realize the recommendation function of single or combined products, and improve the experience degree of enterprise users on the platform.

Description

Collaborative filtering recommendation method based on RFM model

Technical Field

The invention relates to the technical field of electronic commerce, in particular to a collaborative filtering recommendation method based on an RFM model.

Background

The modern era is an information era, and various data are rapidly increased, such as our online consumption records, bank account balance records, our daily reading records on various social media, and the like. Huge energy is stored in the data, and reasonable analysis and use of the data can bring great convenience to our lives, improve social efficiency and promote the further development of human society and even human civilization.

The information on the network is more and more, the growth is faster and faster, various data are scattered, knowledge acquisition is convenient, and meanwhile a series of negative effects are brought, for example, the knowledge acquisition is scattered and is not systematic, and a large amount of time is consumed when the information really meeting the requirements of the user is required to be searched. In the e-commerce platform, when a user wants to search for a desired commodity, a long time is often consumed to search for a product meeting the self requirement from a plurality of similar products. The massive data can lead us to make decisions by the aid of the insight rules, and difficulty in obtaining effective information is increased to a certain extent.

In the process of big data development, the intelligent recommendation algorithm starts to revive new vitality. In recent years, with the deepening of new information technology and industrial integration, a new industrial model and a new industrial state driven by data are emerging, and the development of digital economy and the promotion of the intelligence level of the whole society begin to be promoted by the schedule of economic development. Meanwhile, various information on the network is rapidly increased, and how people efficiently utilize effective information in the information to quickly locate the demands of the people becomes a key.

In E-commerce service, the intelligent recommendation algorithm can recommend commodities to users according to purchase records, search records, collected commodities and the like of the users, so that the time of screening commodities required by the users from massive commodities is greatly shortened, and great convenience is brought to the users.

Disclosure of Invention

The purpose of the invention is: aiming at the problems described in the background art, the invention provides a collaborative filtering recommendation method based on an RFM model, which can quickly and accurately provide a recommendation platform product for enterprise users.

In order to solve the problems, the technical scheme adopted by the invention is as follows: the collaborative filtering recommendation method based on the RFM model is characterized by comprising the following steps of:

the method comprises the following steps: classifying all commodities under an e-commerce platform into N classes according to the attributes of electronic components;

step two: processing a purchase record of a user to be recommended, wherein the purchase record mainly comprises a user id, an id of a purchased commodity, a commodity name and ordering time, and obtaining an attribute vector based on the user; each user can obtain three attribute vectors based on the RFM model, the three attribute vectors respectively correspond to the closeness, the frequency and the value of three measuring indexes in the RFM model, each attribute vector is N-dimensional and respectively represents the corresponding value of the purchased N-type products recorded under the RFM; calculating the values of a proximity attribute vector, a frequency attribute vector and a value attribute vector of a user to be recommended;

step three: generating a final recommendation list: calculating the distance between the user to be recommended and other enterprise users under three measuring indexes of closeness, frequency and value, namely the similarity between the user to be recommended and other enterprise users; then according to the similarity between the user to be recommended and the enterprise user, comprehensively analyzing and calculating the similarity obtained under the three measurement indexes to obtain comprehensive similarity; and then, sequencing the comprehensive similarity of the user to be recommended and other enterprise users to obtain an enterprise list with high similarity as a similar user, and recommending the enterprise list to the user to be recommended by a final product recommendation list formed by integrating two product recommendation lists formed by combining the product purchase records of the user to be recommended according to the product purchase records of the similar user.

Further, in the first step, the value of N is adjusted according to the adjustment of the platform type.

Further, in the second step, the calculating of the nearness attribute vector specifically includes: according to the RFM model, the probability that a user farther from the last purchase time purchases again is smaller than that of a user closer to the last purchase time, and the proximity attribute vector is calculated as follows:

where now denotes the current time, push _ time_ijTime of purchase of j-th item in i-class product, days (now-purchase _ time) for user_ij) Then the number of days, n, that are separated between two time nodes_iFor the number of purchases of goods in category i products, r_iThe value of (a) represents the importance of the record in the final attribute, the closer the interval, the greater the value, the higher the importance, and vice versa, and the final value of each digit of the user attribute is obtained by summing the proximity values of the corresponding commodities of each category purchased by the user.

Further, in the second step, the calculating of the frequency attribute vector specifically includes: according to the RFM model, since a user having a high purchase frequency is more likely to purchase again than a user having a low purchase frequency, the value of the frequency attribute vector, i.e., the number of times the product is purchased, is calculated as follows:

wherein n is_iFor the number of purchases of goods in category i products, p_jIs the jth commercial product in the ith product, I (p ═ p)_j) Indicates if the user has purchased p_jIts value is 1, otherwise it is 0.

Further, in the second step, the calculating of the value degree attribute vector specifically includes: according to the RFM model, a user having a higher purchase value degree is more likely to purchase again than a user having a lower purchase value degree, and the value degree attribute vector is calculated as follows:

wherein n is_iTime (p ═ p) is the number of purchases of the goods in the ith product_j) Representing a user purchase p_jThe number of times.

Further, in the third step, generating a final recommendation list specifically includes:

(1) calculating the distance between the user to be recommended and other enterprise users under three measurement indexes of the RFM model, namely the similarity between the user to be recommended and other enterprise users, calculating the distance or the similarity by adopting the Manhattan distance, and calculating and disclosing as follows:

wherein X and Y represent attribute vectors of two samples, N represents total number of categories and has the same length as value degree, proximity degree and frequency vectors, and X_iThe i-th attribute value, y, representing the sample X_iThe ith attribute value, | x, representing sample Y_i-y_i| represents an attribute value x_iAnd attribute value y_iThe absolute value of the difference between;

x and Y represent attribute vectors of two samples, and similarity is calculated through X and Y;

when calculating the similarity of the proximity attribute, X, Y represent the proximity attribute vector between two users, X_i，y_iRespectively representing the ith attribute value in the X and Y proximity attribute vectors;

when calculating the similarity of the frequency attribute, X, Y represent the frequency attribute vector between two users, X_i，y_iRespectively representing the ith in the X, Y frequency attribute vectorAn attribute value;

when calculating the similarity of the metric attributes, X, Y represent the metric attribute vector between two users, X_i，y_iRespectively representing the ith attribute value in the attribute vectors of X and Y values;

(2) the similarity under the three measurement indexes is normalized, and the calculation mode is as follows:

wherein Sim (X, Y) represents the similarity value after normalization, and max (dist) represents the maximum value of the similarity value under the measurement index;

(3) calculating the similarity between the user to be recommended and the enterprise user, and comprehensively analyzing and calculating the similarity obtained under the three measurement indexes, wherein the comprehensive similarity calculation method comprises the following steps:

S(X，Y)＝Sim_R(X，Y)+Sim_F(X，Y)+Sim_M(X，Y)

wherein, Sim_R(X，Y)Representing the normalized similarity, Sim, under a nearness attribute measure_F(X，Y)Representing the normalized similarity, Sim, under a frequency attribute measure_M(X，Y)Expressing the similarity after normalization under the value degree attribute measurement standard, and S (X, Y) expressing the comprehensive similarity of the sample X and the sample Y;

(4) sequencing the comprehensive similarity of the user to be recommended and other enterprise users, wherein the front users with high similarity are determined to be the first users with high similarity, and the front users with low similarity are determined to be the second users;

(5) forming two product recommendation lists of products purchased by similar users within a set time, wherein the list 1 is that the similar users purchase the to-be-recommended users and do not purchase the to-be-recommended users, and the list 2 is that the similar users purchase the to-be-recommended users and purchase the to-be-recommended users; the two lists are respectively subjected to primary sorting according to the number of purchasers of recommended products in similar users, products with the same number of purchasers are sorted according to the purchase amount, and products with the same number of purchasers and the same amount are sorted according to the latest purchase time;

(6) and linking the sorted list 1 and the list 2, namely directly connecting the list 2 behind the list 1, combining the lists to form a product recommendation list, and recommending the product recommendation list to the user to be recommended.

Further, in the third step (5), the length of the predetermined time is modified according to the usage of the platform.

The technical scheme provided by the embodiment of the invention has the beneficial effects that at least: the RFM model-based collaborative filtering recommendation method has the beneficial effects that:

(1) the method can solve the problem of recommending products, realize the recommendation function of single or combined products, and improve the experience degree of enterprise users on the platform. The method combines the basic information of the enterprise users (such as the scale of the enterprise, the capacity of the enterprise and the position of the enterprise) and the related information of the enterprise users on the platform (such as recent consumption records of the enterprise, recent consumption frequency, consumption amount of the enterprise on the platform and credit evaluation of the enterprise) to establish user portrayal for the enterprise and timely recommend platform products to the enterprise.

(2) According to the RFM model, the closer the last consumed time is to the present, the higher the possibility that the user responds to the recommended result is represented; the user who purchases more frequently in a given time is also a more satisfactory user for purchasing services, and the possibility that reasonable recommendation is made to the user is reflected is relatively high. According to the twenty-eight law (pareto's law), nearly 80% of the company's revenue comes from 20% of the users who indicate that the top one tenth of the users consumes three times the amount of money of the next level of users, and therefore the probability of trying to recommend product information to these users is much greater than that of obtaining responses to other users who recommend the latter level. Therefore, the method adopts the RFM model to analyze the user data to obtain the portrait of the enterprise user, and then recommends the unused products according to the behavior characteristics of different enterprise users, thereby achieving good effect.

(3) The method has better performance on EE problem (Exploration and Exploitation problem, which means that the problem of Exploration and utilization, such as business, new profit model Exploration and the conflict problem of maintaining the current profit margin) is solved. The method can obtain good performance on the recall rate, the recall rate is associated with the EE problem (the higher the recall rate is, the smaller the EE problem is; the smaller the recall rate is, the heavier the EE problem is), namely, products which are out of the user interest but are likely to be interested by the user can be properly recommended to the user on the basis of the original interest of the user. The recall rate actually represents how much the user really is interested in being mined, and the fact that the stable recall rate is kept along with the increase of the number of recommended people in the method means that the recommendation algorithm has better performance on the EE problem, so that the method has more advantages compared with the traditional collaborative filtering recommendation algorithm based on the user.

Remarking: recall rate

True Positive (TP): predicting the positive class as a positive class number;

true Negative, TN: predicting negative classes as negative class numbers

False Positive (FP): predicting negative classes as positive class number false positives

False Negative (FN): predict positive class as negative class number → missing report

The recall rate is a measurement of the coverage, a plurality of positive examples of the measurement are divided into positive examples, and the calculation formula is as follows:

additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

fig. 1 is a flowchart of a collaborative filtering recommendation method based on an RFM model according to an embodiment of the present invention.

DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The embodiment of the invention provides a collaborative filtering recommendation method based on an RFM (recursive filter model), which comprises the following three steps:

the method comprises the following steps: all commodities under the e-commerce platform are classified, and all products are classified into N categories according to the attributes of the electronic components (the numerical value of N can be adjusted along with the adjustment of the platform category).

Step two: and processing the purchase records of the user to be recommended, wherein the purchase records mainly comprise a user id, an id of a purchased commodity, a commodity name, ordering time and the like, and obtaining an attribute vector based on the user. Based on the RFM model, each user can obtain three attribute vectors, the vectors respectively correspond to the Recency (Recency), the Frequency (Frequency) and the value (Monetary) in the RFM model, and each attribute vector is N-dimensional and respectively represents the corresponding value recorded under the RFM of the purchased N types of products. The values of the proximity attribute vector, the frequency attribute vector and the value attribute vector are calculated respectively.

Step three, generating a final recommendation list: and calculating the distances between the user to be recommended and other enterprise users under three measuring indexes of closeness, frequency and value (three measuring indexes of the RFM model), namely the similarity between the user to be recommended and other enterprise users. And comprehensively analyzing and calculating the similarity obtained under the three measurement indexes according to the similarity between the user to be recommended and the enterprise user to obtain the comprehensive similarity. And then sequencing the comprehensive similarity of the user to be recommended and other enterprise users to obtain an enterprise list with high similarity as a similar user, and recommending the product recommendation list formed by integrating two product recommendation lists formed by combining the product purchase records of the user to be recommended to the user to be recommended according to the product purchase records of the similar user.

The steps are described in detail as follows:

Step two: and processing the purchase records of the user, wherein the purchase records mainly comprise a user id, an id of a purchased commodity, a commodity name, a placing time and the like, and obtaining an attribute vector based on the user. Based on the RFM model, each user can obtain three attribute vectors, the vectors respectively correspond to the Recency (Recency), the Frequency (Frequency) and the value (Monetary) in the RFM model, and each attribute vector is N-dimensional and respectively represents the corresponding value recorded under the RFM of the purchased N types of products.

(1) A proximity attribute vector.

According to the RFM model, the probability that a user farther from the last purchase time purchases again is smaller than that of a user closer to the last purchase time, and the proximity attribute vector is calculated as follows:

where now denotes the current time, push _ time_ijTime of purchase of j-th item in i-class product, days (now-purchase _ time) for user_ij) Then the number of days, n, that are separated between two time nodes_iThe number of purchases of the goods in the ith product. r is_iThe size of the value of (A) represents the importance of the record in the final attribute, with values larger closer the spacingI.e. the higher the importance and vice versa. And the final value of each digit of the user attribute is obtained by summing the recency values of the commodities purchased by the user and corresponding to each category.

(2) Frequency attribute vector

According to the RFM model, since a user having a high purchase frequency is more likely to purchase again than a user having a low purchase frequency, the value of the frequency attribute vector, i.e., the number of times the product is purchased, is calculated as follows:

(3) Value degree attribute vector

According to the RFM model, a user having a higher purchase value degree is more likely to purchase again than a user having a lower purchase value degree, and the value degree attribute vector is calculated as follows:

Step three: forming product recommendation lists

(1) And calculating the distances between the user to be recommended and other enterprise users under the three measurement indexes (the three measurement indexes of the RFM model), namely the similarity between the user to be recommended and the other enterprise users. The method adopts the Manhattan distance to calculate the distance or the similarity, and the calculation is disclosed as follows:

wherein X and Y represent attribute vectors of two samples, N represents total number of categories and has the same length as value degree, proximity degree and frequency vectors, and X_iThe i-th attribute value, y, representing the sample X_iThe ith attribute value, | x, representing sample Y_i-y_i| represents an attribute value x_iAnd attribute value y_iThe absolute value of the difference between.

And X and Y represent attribute vectors of two samples, and similarity is calculated through X and Y.

When calculating the similarity of the proximity attribute, X, Y represent the proximity attribute vector between two users, X_i，y_iRespectively representing the ith attribute value in the X and Y proximity attribute vectors.

When calculating the similarity of the frequency attribute, X, Y represent the frequency attribute vector between two users, X_i，y_iRespectively representing the ith attribute value in the X and Y frequency attribute vectors.

When calculating the similarity of the metric attributes, X, Y represent the metric attribute vector between two users, X_i，y_iRespectively representing the ith attribute value in the attribute vector with X and Y values. (2) The similarity under the three measurement indexes is normalized, and the calculation mode is as follows:

where Sim (X, Y) represents the similarity value after normalization, and max (dist) represents the maximum value of the similarity value under the metric (R, F or M).

S(X，Y)＝Sim_R(X，Y)+Sim_F(X，Y)+Sim_M(X，Y)

wherein, Sim_R(X，Y)Representing the normalized similarity, Sim, under a nearness attribute measure_F(X，Y)Representing the normalized similarity, Sim, under a frequency attribute measure_M(X，Y)The similarity after normalization under the metric of the value degree attribute is expressed, and S (X, Y) represents the comprehensive similarity of the sample X and the sample Y.

(4) And sequencing the comprehensive similarity of the user to be recommended and other enterprise users, wherein the other enterprise users with high similarity are in front of the user to be recommended, and the other enterprise users with low similarity are behind the user to be recommended. The top 10 with higher similarity is determined as the user, and is called as the similar user.

(5) Products purchased by similar users within a certain time (the length of the certain time can be modified according to the use condition of the platform, and the length of the certain time can be tentatively set to be 6 months) form two product recommendation lists. The list 1 is that the similar users have purchased the users to be recommended and have not purchased, and the list 2 is that the similar users have purchased the users to be recommended and have purchased. And performing primary sorting on the two lists according to the number of purchasers of recommended products in similar users, sorting the products with the same number of purchasers according to the purchase amount, and sorting the products with the same number of purchasers and the same amount according to the latest purchase time.

(6) And (3) linking the sorted list 1 and list 2 (namely, the list 2 is directly connected to the back of the list 1, the recommended product of the list 1 is in the front, and the recommended product of the list 2 is in the back), forming a product recommendation list, and recommending the product recommendation list to the user to be recommended.

It should be understood that the specific order or hierarchy of steps in the processes disclosed is an example of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not intended to be limited to the specific order or hierarchy presented.

In the foregoing detailed description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the subject matter require more features than are expressly recited in each claim. Rather, as the following claims reflect, invention lies in less than all features of a single disclosed embodiment. Thus, the following claims are hereby expressly incorporated into the detailed description, with each claim standing on its own as a separate preferred embodiment of the invention.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. Of course, the processor and the storage medium may reside as discrete components in a user terminal.

For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in memory units and executed by processors. The memory unit may be implemented within the processor or external to the processor, in which case it can be communicatively coupled to the processor via various means as is known in the art.

What has been described above includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the aforementioned embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of various embodiments are possible. Accordingly, the embodiments described herein are intended to embrace all such alterations, modifications and variations that fall within the scope of the appended claims. Furthermore, to the extent that the term "includes" is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term "comprising" as "comprising" is interpreted when employed as a transitional word in a claim. Furthermore, any use of the term "or" in the specification of the claims is intended to mean a "non-exclusive or".

Claims

1. The collaborative filtering recommendation method based on the RFM model is characterized by comprising the following steps of:

2. The RFM-model-based collaborative filtering recommendation method of claim 1, wherein in the first step, the value of N is adjusted according to the adjustment of platform class.

3. The RFM-model-based collaborative filtering recommendation method of claim 1, wherein in the second step, the calculation of the nearness attribute vector specifically comprises: according to the RFM model, the probability that a user farther from the last purchase time purchases again is smaller than that of a user closer to the last purchase time, and the proximity attribute vector is calculated as follows:

4. The RFM-model-based collaborative filtering recommendation method of claim 3, wherein in the second step, the calculation of the frequency attribute vector specifically comprises: according to the RFM model, since a user having a high purchase frequency is more likely to purchase again than a user having a low purchase frequency, the value of the frequency attribute vector, i.e., the number of times the product is purchased, is calculated as follows:

5. The RFM-model-based collaborative filtering recommendation method of claim 4, wherein in the second step, the calculation of the value degree attribute vector specifically comprises: according to the RFM model, a user having a higher purchase value degree is more likely to purchase again than a user having a lower purchase value degree, and the value degree attribute vector is calculated as follows:

6. The RFM-model-based collaborative filtering recommendation method of claim 1, wherein in the third step, generating a final recommendation list specifically comprises:

when calculating the similarity of the frequency attribute, X, Y represent the frequency attribute vector between two users, X_i，y_iRespectively representing the ith attribute value in the X and Y frequency attribute vectors;

when calculating the similarity of the metric attributes, X, Y represent the metric attribute vector between two users，x_i，y_iRespectively representing the ith attribute value in the attribute vectors of X and Y values;

S(X，Y)＝Sim_R(X，Y)+Sim_F(X，Y)+Sim_M(X，Y)

(5) forming two product recommendation lists for products purchased by similar users within a certain time, wherein the list 1 is that the similar users have purchased the to-be-recommended users and have not purchased the to-be-recommended users, and the list 2 is that the similar users have purchased the to-be-recommended users and have purchased the to-be-recommended users; the two lists are respectively subjected to primary sorting according to the number of purchasers of recommended products in similar users, products with the same number of purchasers are sorted according to the purchase amount, and products with the same number of purchasers and the same amount are sorted according to the latest purchase time;

7. The RFM model-based collaborative filtering recommendation method of claim 6, wherein in the third step (5), the length of a certain time is modified according to platform usage.