CN112435078A

CN112435078A - Method for classifying loyalty of users

Info

Publication number: CN112435078A
Application number: CN202011468575.8A
Authority: CN
Inventors: 杨钱钱; 唐军; 谢禹
Original assignee: Sichuan Changhong Electric Co Ltd
Current assignee: Sichuan Changhong Electric Co Ltd
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2021-03-02

Abstract

The invention discloses a method for classifying user loyalty, which adds user static information geographical position information into user data of user loyalty evaluation through a GIS (geographic information system) geographical coding technology, enhances the data base of the user loyalty evaluation, adopts a Canopy algorithm to perform rough clustering on the user data, then uses a K-means algorithm to perform accurate clustering on the users, and realizes accurate classification of the user loyalty of enterprises on the basis of comprehensively considering all factors influencing the user loyalty evaluation.

Description

Method for classifying loyalty of users

Technical Field

The invention relates to the technical field of enterprise data mining, in particular to a method for classifying user loyalty.

Background

As the market of the seller gradually changes into the market of the buyer, the user replaces the enterprise to become the leading person of the business activity, and the user becomes the decisive factor of the success or failure of the enterprise operation. However, different users have different meanings to enterprises, and the famous 80/20 rule has already revealed only 20% of users (basically loyalty users of enterprises) who really can make profits for enterprises, so that enterprises need to pay attention to the development of loyalty of existing users while continuously attracting new users.

The prior art for assessing loyalty of enterprise users collects user behavior data and then classifies the loyalty of the users using a classification algorithm. In the process, the influence of the static information and the geographic position information of the user on the loyalty of the user is ignored in the prior art, and the offline store is still an important component of the enterprise product to the consumer as a main channel for displaying goods or providing services for the consumer by the enterprise. The distance from the enterprise user to the offline store is an important influence factor for determining whether the user visits the enterprise product or purchases the enterprise product, and the user visits the enterprise and experiences or purchases the enterprise product to have subsequent conclusion about whether the user approves the enterprise and is loyal to the enterprise. Therefore, in evaluating the loyalty of the user, the static location information of the user should also be considered. In addition, in the process of classifying users, the users are directly classified by using a K-means algorithm, and the problems of low operation efficiency of the user classification algorithm and low user classification accuracy are caused by the defects that the initial value K of the user cluster number is difficult to estimate and is sensitive to isolated points.

Disclosure of Invention

In the existing methods for classifying the loyalty of a plurality of users, only the behavior information of the users is considered, but the static information and the geographic position information of the users are not considered for evaluating the loyalty of the users. The invention adds the user static information geographical position information into the user data of the user loyalty evaluation by the GIS geographical coding technology, thereby enhancing the data base of the user loyalty evaluation and striving to ensure the accuracy of the enterprise loyalty user classification. In addition, after the Canopy algorithm is adopted to carry out coarse clustering on user data, the K-means algorithm is utilized to carry out accurate clustering on users, and the influence on the operation efficiency and the classification accuracy of the K-means algorithm due to the fact that the K value is difficult to estimate and is sensitive to isolated points is solved.

The user in the invention refers to a user who has paid at least once in the enterprise and has a specific receiving address.

The invention realizes the purpose through the following technical scheme:

a method of user loyalty classification, comprising the steps of:

step 1: extracting user behavior data: extracting user access browsing and consumption purchasing data from databases of online stores/malls and online stores of enterprises;

step 2: collecting static geographic position information of a user and geographic position information of an offline store of an enterprise: connecting the receiving address and the after-sales service address of the enterprise user to extract the static address information of the user, and extracting the geographical position information of the online store of the enterprise from the online store database of the enterprise;

and step 3: converting the corresponding geographical position information into longitude and latitude through the static geographical position information of the user and the position information of the online store of the enterprise, which are collected in the step 2, by a GIS geographical forward coding technology, and calculating the shortest distance from the user to the online store of the enterprise by adopting an Euclidean distance calculation method;

and 4, step 4: researching existing loyalty related documents and enterprise business actual conditions, constructing a user loyalty evaluation index system, and extracting a user behavior feature data set;

and 5: determining the weight index of each feature in the index system by an analytic hierarchy process;

step 6: normalizing the user index data extracted in the step 3 to calculate the loyalty index;

and 7: preliminarily determining the number of user loyalty clustering clusters by adopting a canty algorithm;

and 8: loyalty clustering is performed on users using the KMEANS algorithm.

In the step 1, loyalty of the enterprise users is classified based on all data collected by the online stores and the offline physical stores of the enterprise, and online and offline access, browsing and consumption behavior evaluation is performed on the users by using data assets of the enterprise, so that the loyalty classification accuracy of the users is ensured.

In the steps 2 and 3, by acquiring static geographical position information of the enterprise user and geographical position information of the online store of the enterprise, converting the corresponding geographical position information into longitude and latitude by adopting a GIS geographical forward coding technology, and calculating the shortest distance from the user to the online store of the enterprise by adopting an Euclidean distance calculation method;

shortest distance Dmin from user to offline store of enterprise

D_i＝{min(d_ij)|d_ij＝|A_i，B_j|，i＝1，...，n；j＝1，...，m}

Wherein Ai represents the longitude and latitude of the static geographic position information of the user, Bj represents the longitude and latitude of an off-line store address of the enterprise, | Ai, Bj | represents the earth surface distance between two points of Ai and Bj.

Further, in the steps 4, 5 and 6, a user loyalty classification index system is constructed, an analytic hierarchy process is adopted to determine the index weight, and the user loyalty score is calculated.

Further, in the steps 6 and 7, before clustering the users, rough clustering is performed by adopting a Canopy algorithm, and the number of clustering centers and the clustering center point of the K-means cluster are preliminarily determined; the coarse clustering step comprises:

firstly, vectorizing an enterprise user data set, putting the vectorized enterprise user data set into a list, and determining a threshold value T according to an experimental method₁、T₂And T₁＞T₂；

Secondly, taking any point P from the list, calculating the distance between the point P and all the Canopy (if no Canopy exists currently, the point P is taken as a Canopy), and if the distance between the point P and a Canopy is within T1, adding the point P into the Canopy;

thirdly, if the distance between the point P and a certain Canopy is within T2, the point P needs to be deleted from list, and the first step is that the point P is considered to be close enough to the Canopy at the moment, so that the point P cannot be used as the center of other canlays any more;

fourthly, repeating the second step and the third step until the list is empty, and ending;

and fifthly, finishing the algorithm, wherein the data of the inner point of Canopy is the data k of the next clustering center.

Further, in the step 8, the step of clustering the loyalty of the user by using the K-means algorithm comprises:

first, with D ═ x₁，x₂，...x_tDenotes a user set, k is the cluster number determined by the canopy algorithm in the previous step, N denotes the maximum iteration number, and C ═ C₁，C₂，...C_kDenotes a divided cluster.

Second, randomly select k samples from the data set D as the initial cluster center { μ }₁，μ₂，...μ_k}；

Thirdly, for any sample point x_i(i ═ 1, 2.. times, t), which are calculated to k cluster centers μ, respectively_jThe distance (j ═ 1, 2.. times.k) is divided into clusters represented by the center points closest to the distance, and the specific formula for calculating the distance is as follows:

the fourth step, the clusters C are aligned_jRecalculating the cluster center μ for all sample points in (j ═ 1, 2.. times, k)_j(j ═ 1, 2.. times, k), the specific formula is:

the fifth step, repeating the third and the fourth steps to k clustering centers mu_jAnd (j ═ 1, 2.. times, k) performing iterative updating until the clustering center is unchanged, or the maximum iteration number N is reached, or a set fault-tolerant range is reached, considering that a stable state is reached, ending iteration, and outputting a clustering result.

The invention has the beneficial effects that:

according to the method for classifying the loyalty of the users, the static information geographical position information of the users is added into the user data of the loyalty evaluation of the users through a GIS (geographic information system) geographical coding technology, so that the data base of the loyalty evaluation of the users is enhanced, the users are accurately clustered by using a K-means algorithm after rough clustering is carried out on the user data by adopting a Canopy algorithm, and accurate classification of the loyalty of the enterprises is realized on the basis of comprehensively considering all factors influencing the loyalty evaluation of the users.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following briefly introduces the embodiments or the drawings needed to be practical in the prior art description, and obviously, the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow chart of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without any inventive step, are within the scope of the present invention.

In any embodiment, as shown in fig. 1, a method for classifying user loyalty according to the present invention combines a Canopy algorithm and a K-means algorithm to classify the loyalty of users based on the static geographic location information of users and the user behavior data, and realizes accurate classification of the loyalty of users of enterprises based on comprehensive consideration of all factors affecting the loyalty evaluation of users. The user in the invention refers to a user who has paid at least once in the enterprise and has a specific receiving address. The invention comprises the following steps:

(1) extracting user behavior data: and associating and integrating the online and offline behavior data of the user by taking the mobile phone number of the user as an ID.

Firstly, acquiring historical consumption data and access browsing collection data of a user on an enterprise line, wherein the access browsing collection data of the user is from an online log database of the enterprise, and the consumption data of the user is from a business database of the enterprise; and then acquiring historical consumption data and visiting data of the users under the enterprise line.

Extracting historical browsing and consumption records of the users, and extracting access browsing and consumption purchasing records of the users registered to the online shopping mall/store of the enterprise within the last five years (determined according to analysis time period requirements of the enterprise users), namely extracting user access browsing data stored in a weblog database of the enterprise within the last five years and user consumption purchasing data stored in a business database. Wherein, the user refers to the user who has been consumed at least once in the enterprise and has a complete receiving address.

The part of data in the invention is from the visiting and consumption records of the user collected by the camera installed in the online store of the enterprise by adopting the face recognition technology, so that the data can truly and completely reflect the behavior characteristics of the user in the online store. The two parts of data are associated through the mobile phone number of the user, and the on-line and off-line user behavior data integration is completed.

(2) User static geographical position information and enterprise off-line store geographical position information collection

And extracting the receiving address, the after-sales service address, the delivery and delivery address, the delivery and pick-up address of the user from the enterprise business database, and processing data duplication removal and the like to obtain the static geographic position information data set of the user.

Extracting all offline store addresses from the enterprise store information database to obtain an offline store address data set of the enterprise: extracting a user ID, user consumption time and a user receiving address from a business database of an enterprise, extracting the user ID, after-sale service time and the user after-sale service address from an after-sale database of the user, merging user data by taking the user ID as a unique key, and taking the after-sale service address or a receiving address corresponding to the current nearest consumption time as static geographical position information of the user; extracting an enterprise online store ID and online store geographical position information from an enterprise online store database; and respectively obtaining an enterprise user geographical position information data set and an enterprise store-first data set.

(3) Data cleansing

Cleaning the data according to the integrity, uniqueness, authority, legality and consistency of the data;

and eliminating fields and data which are irrelevant to the loyalty classification index system of the enterprise user, such as testing, page jumping and the like of the enterprise developer, and obtaining a user basic data set which comprises user IDs, VINFOs, access time, checked commodity entries, consumption dates, consumption amounts and the like and is directly relevant to the loyalty classification indexes of the user.

(4) Converting user address into longitude and latitude by GIS geographic coding technology

And (3) according to the user static geographic position information data set collected in the step (2), converting the user position information into latitude and longitude from text data by calling a map Web API interface provided by a map service provider in domestic mainstream and adopting a forward coding technology in GIS geographic coding.

(5) Converting enterprise offline store addresses into longitude and latitude by GIS (geographic information System) geographic coding technology

And (3) according to the online store address data set of the enterprise collected in the step (2), converting the online store address data of the enterprise into latitude and longitude by calling a map Web API interface provided by a mainstream map service provider in China and adopting a forward coding technology in GIS (geographic information system) geographic coding. And calling an API (application programming interface) provided by a mainstream map service provider in China by using python, and converting the static geographical position information of the user and the geographical position information of the enterprise store into longitude and latitude by adopting a forward coding technology in GIS (geographic information system) geographical coding.

(6) Calculating the shortest distance D from each user to the online store of the enterprise_min

And calculating the shortest distance from each user to the online store of the enterprise by adopting an Euclidean distance calculation method.

D_i＝{min(d_ij)|d_ij＝|A_i，B_j|，i＝1，...，n；j＝1，...，m}

Wherein A is_iLatitude and longitude representing static geographical location information of a user, B_jRepresents the latitude and longitude of the store address of the off-line enterprise, | A_i，B_jAnd | represents the ground distance between two points Ai and Bj.

(7) Researching the existing loyalty-related documents and the actual conditions of enterprise business, constructing a user loyalty evaluation index system, and extracting a user behavior feature data set;

and combining literature research related to user loyalty segmentation and enterprise business characteristics and process characteristics in the Internet environment, constructing an index system for enterprise user loyalty evaluation, and extracting a user behavior characteristic data set. The index system in the invention comprises 11 user behavior indexes as shown in the following table:

the method comprises the steps that users are distinguished by a VINFO field in an access log database, when user access characteristic behaviors are calculated, log tables are connected through the VINFO field, and an SQL program is written to calculate the access characteristics of each user; and distinguishing users by LOGIN _ ID in the enterprise business database, and when the purchasing behavior characteristics of the users are calculated, connecting an enterprise business table through a LOGIN _ ID field, and writing an SQL program to calculate the purchasing behavior characteristics of the users.

(8) Determining a weight index for each feature in an index system by analytic hierarchy process

The first step is as follows: designing questionnaire according to index system, asking experts to objectively judge the same level factor belonging to each factor in the previous layer, namely comparing the indexes with each other by 1-9 scale method, and scoring the relative importance between the indexes to obtain judgment matrix P of the importance of the indexes₁，P₂₁，P₂₂In which P is₁Is a two-level inter-index importance comparison matrix, P₂₁To access a three-level inter-indicator significance comparison matrix, P, under loyalty dimensions₂₂An importance comparison matrix between three levels of metrics under the loyalty dimension of consumption.

The second step is that: and respectively calculating weight vectors through the judgment matrixes. Setting a judgment matrix P^*Within n indices, then p_ijIs the importance of the ith index relative to the jth index, where i, j ∈ [1, n ∈ ]]And normalizing each column to obtain:

wherein, Σ p_ijIs the sum of the columns, from which a new matrix Q is obtained^*. To Q^*Summing each row in the process to obtain a feature vector, and obtaining the weight of each index after normalization processing of the feature vector, wherein the specific formula is as follows:

and thirdly, respectively carrying out consistency check on each judgment matrix, namely checking the consistency of the matrix by using a consistency index, a random consistency index and a consistency ratio. The specific calculation formula of the consistency ratio CR is:

in the above formula, CI represents a consistency index, RI represents a random consistency index, and the specific calculation formula is as follows:

in the above formula, λ_max(P^*) To judge the matrix P^*N is the matrix P^*Of (c) is calculated. The random consistency index RI is obtained by looking up a table according to the dimensionality of the matrix, and the specific parameters are shown in the following table:

n	1	2	3	4	5	6	7	8	9
										RI	0.00	0.00	0.58	0.9	1.12	1.24	1.32	1.41	1.45

the specific judgment criteria for whether the consistency check passes or not are as follows: when CR is less than 0.1, the judgment matrix is considered to pass consistency test, and the normalized characteristic vector of the judgment matrix is used as a weight vector; otherwise, the decision matrix needs to be readjusted until the consistency check is passed. The specific adjustment method is to adopt a maximum deviation term correction method to reconstruct a judgment matrix, and the method is described as follows:

according to the judgment matrix P^·Is given by (W)₁，w₂，...，w_n)^TReconstructing a decision matrix R^*＝(r_ij)＝(w_i/w_j) Calculating a deviation matrix:

Δ＝(δ_ij)＝(|p_ij-r_ij|)

for delta_ijP corresponding to the maximum term_ijMake a correction to p_ij＝r_ij，p_ji＝r_jiSubstituting into the original matrix P^*And forming a new judgment matrix. By adjusting step by step according to the steps, the consistency is continuously improved until the requirement is met. It should be noted that the numerical meanings of the scale 1-9 mentioned in the expert score are shown in the following table:

(9) normalizing user index data max-min to obtain user loyalty index

Step one, normalizing user index data according to a max-min normalization formula; the formula is as follows:

wherein, X_iIndicating the original value, X, of a certain index of the user before normalization_maxRepresenting the maximum value, X, of a certain index of the user before normalization_minRepresents the minimum value, X 'of a certain index of the user before normalization'_iA normalized value of some indicator representing the user.

Secondly, calculating the loyalty index of each user according to the weight of each feature obtained after the consistency check, wherein the specific calculation method comprises the following steps:

loyal_t＝αvisit_t+βpurchase_t+γdistance_t

wherein loyal_tLoyalty points, visit, indicating user t_tExpress access loyalty points, purchasases_tRepresenting a consumption loyalty score, α and β corresponding to the weights of access loyalty and consumption loyalty, respectively; wherein visit_tAnd purchasse_tThe calculation formula of (2) is as follows:

visit_t＝α₁A1_t+α₂A2_t+...+α_mAm_t

purchase_t＝β₁B1_t+β₂B2_t+...+β_nBn_t

wherein A is_i(i ═ 1, 2,. m) and B_j(j ═ 1, 2.. n) respectively represents user access behavior characteristics and consumption behavior characteristics, namely three-level indexes under access loyalty and consumption loyalty dimensions screened after the characteristics are selected; alpha is alpha_i(i ═ 1, 2,. m) and β_j(j ═ 1, 2,. n) represents the weight of each behavioral characteristic.

(10) Coarse clustering of users by using canopy algorithm

(11) Loyalty clustering for users using K-means algorithm

First, with D ═ x₁，x₂，...x_tDenotes a user set, k is the cluster number determined by the canopy algorithm in the previous step, N denotes the maximum iteration number, and C ═ C₁，C₂，...C_kMeans a strokeAnd (4) clustering.

Before clustering is carried out on users, a Canopy algorithm is adopted for rough clustering, the number of clustering centers and clustering center points of K-means clustering are preliminarily determined, the complexity of distances among calculation points in the K-means clustering algorithm is effectively reduced, the memory cost is reduced, the obvious time advantage is achieved, meanwhile, the anti-interference capability of the K-means algorithm on isolated points is enhanced, and the accuracy of loyalty classification of the users is improved.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims. It should be noted that the various technical features described in the above embodiments can be combined in any suitable manner without contradiction, and the invention is not described in any way for the possible combinations in order to avoid unnecessary repetition. In addition, any combination of the various embodiments of the present invention is also possible, and the same should be considered as the disclosure of the present invention as long as it does not depart from the spirit of the present invention.

Claims

1. A method of user loyalty classification, comprising the steps of:

and 8: and carrying out loyalty clustering on the users by utilizing a K-means algorithm.

2. The method as claimed in claim 1, wherein in step 1, the loyalty of the enterprise user is classified based on all data collected by the online and offline brick-and-mortar stores of the enterprise, and the data assets of the enterprise are used to evaluate the behavior of the user based on online and offline access, browsing and consumption, so as to ensure the accuracy of the loyalty classification of the user.

3. The method as claimed in claim 1, wherein in steps 2 and 3, by obtaining static geographical location information of the enterprise user and geographical location information of the online store of the enterprise, the corresponding geographical location information is converted into longitude and latitude by using a GIS geographical forward coding technique, and the shortest distance from the user to the online store of the enterprise is calculated by using a euclidean distance calculation method.

Shortest distance Dmin from user to offline store of enterprise

D_i＝{min(d_ij)|d_ij＝|A_i，B_j|，i＝1，...，n；j＝1，...，m}

4. The method as claimed in claim 1, wherein in the steps 4, 5 and 6, a user loyalty classification index system is constructed, an analytic hierarchy process is used to determine the index weight, and the user loyalty score is calculated.

5. The method according to claim 1, wherein in steps 6 and 7, before clustering, rough clustering is performed on the users by using a Canopy algorithm to preliminarily determine the number of clustering centers and the clustering center point of the K-means cluster; the coarse clustering step comprises:

first, the rabbetVectorizing the business user data set, putting the vectorized business user data set into a list, and determining a threshold value T according to an experimental method₁、T₂And T₁＞T₂；

Secondly, taking any point P from the list, calculating the distance between the point P and all the Canopy if the distance between the point P and a Canopy is within T1, and adding the point P into the Canopy;

6. The method of claim 1, wherein in step 8, clustering loyalty of users using a K-means algorithm comprises:

first, with D ═ x₁，x₂，...x_tDenotes a user set, k is the cluster number determined by the canopy algorithm in step 7, N denotes the maximum iteration number, and C ═ C₁，C₂，...C_kDenotes a divided cluster;

fourth step of clustering C_j(j＝1，2,.. k) recalculating the cluster center μ for all sample points_jThe specific formula (j ═ 1, 2.., k) is: