CN112435078A - Method for classifying loyalty of users - Google Patents

Method for classifying loyalty of users Download PDF

Info

Publication number
CN112435078A
CN112435078A CN202011468575.8A CN202011468575A CN112435078A CN 112435078 A CN112435078 A CN 112435078A CN 202011468575 A CN202011468575 A CN 202011468575A CN 112435078 A CN112435078 A CN 112435078A
Authority
CN
China
Prior art keywords
user
enterprise
loyalty
clustering
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011468575.8A
Other languages
Chinese (zh)
Inventor
杨钱钱
唐军
谢禹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Changhong Electric Co Ltd
Original Assignee
Sichuan Changhong Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Changhong Electric Co Ltd filed Critical Sichuan Changhong Electric Co Ltd
Priority to CN202011468575.8A priority Critical patent/CN112435078A/en
Publication of CN112435078A publication Critical patent/CN112435078A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Game Theory and Decision Science (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Remote Sensing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method for classifying user loyalty, which adds user static information geographical position information into user data of user loyalty evaluation through a GIS (geographic information system) geographical coding technology, enhances the data base of the user loyalty evaluation, adopts a Canopy algorithm to perform rough clustering on the user data, then uses a K-means algorithm to perform accurate clustering on the users, and realizes accurate classification of the user loyalty of enterprises on the basis of comprehensively considering all factors influencing the user loyalty evaluation.

Description

Method for classifying loyalty of users
Technical Field
The invention relates to the technical field of enterprise data mining, in particular to a method for classifying user loyalty.
Background
As the market of the seller gradually changes into the market of the buyer, the user replaces the enterprise to become the leading person of the business activity, and the user becomes the decisive factor of the success or failure of the enterprise operation. However, different users have different meanings to enterprises, and the famous 80/20 rule has already revealed only 20% of users (basically loyalty users of enterprises) who really can make profits for enterprises, so that enterprises need to pay attention to the development of loyalty of existing users while continuously attracting new users.
The prior art for assessing loyalty of enterprise users collects user behavior data and then classifies the loyalty of the users using a classification algorithm. In the process, the influence of the static information and the geographic position information of the user on the loyalty of the user is ignored in the prior art, and the offline store is still an important component of the enterprise product to the consumer as a main channel for displaying goods or providing services for the consumer by the enterprise. The distance from the enterprise user to the offline store is an important influence factor for determining whether the user visits the enterprise product or purchases the enterprise product, and the user visits the enterprise and experiences or purchases the enterprise product to have subsequent conclusion about whether the user approves the enterprise and is loyal to the enterprise. Therefore, in evaluating the loyalty of the user, the static location information of the user should also be considered. In addition, in the process of classifying users, the users are directly classified by using a K-means algorithm, and the problems of low operation efficiency of the user classification algorithm and low user classification accuracy are caused by the defects that the initial value K of the user cluster number is difficult to estimate and is sensitive to isolated points.
Disclosure of Invention
In the existing methods for classifying the loyalty of a plurality of users, only the behavior information of the users is considered, but the static information and the geographic position information of the users are not considered for evaluating the loyalty of the users. The invention adds the user static information geographical position information into the user data of the user loyalty evaluation by the GIS geographical coding technology, thereby enhancing the data base of the user loyalty evaluation and striving to ensure the accuracy of the enterprise loyalty user classification. In addition, after the Canopy algorithm is adopted to carry out coarse clustering on user data, the K-means algorithm is utilized to carry out accurate clustering on users, and the influence on the operation efficiency and the classification accuracy of the K-means algorithm due to the fact that the K value is difficult to estimate and is sensitive to isolated points is solved.
The user in the invention refers to a user who has paid at least once in the enterprise and has a specific receiving address.
The invention realizes the purpose through the following technical scheme:
a method of user loyalty classification, comprising the steps of:
step 1: extracting user behavior data: extracting user access browsing and consumption purchasing data from databases of online stores/malls and online stores of enterprises;
step 2: collecting static geographic position information of a user and geographic position information of an offline store of an enterprise: connecting the receiving address and the after-sales service address of the enterprise user to extract the static address information of the user, and extracting the geographical position information of the online store of the enterprise from the online store database of the enterprise;
and step 3: converting the corresponding geographical position information into longitude and latitude through the static geographical position information of the user and the position information of the online store of the enterprise, which are collected in the step 2, by a GIS geographical forward coding technology, and calculating the shortest distance from the user to the online store of the enterprise by adopting an Euclidean distance calculation method;
and 4, step 4: researching existing loyalty related documents and enterprise business actual conditions, constructing a user loyalty evaluation index system, and extracting a user behavior feature data set;
and 5: determining the weight index of each feature in the index system by an analytic hierarchy process;
step 6: normalizing the user index data extracted in the step 3 to calculate the loyalty index;
and 7: preliminarily determining the number of user loyalty clustering clusters by adopting a canty algorithm;
and 8: loyalty clustering is performed on users using the KMEANS algorithm.
In the step 1, loyalty of the enterprise users is classified based on all data collected by the online stores and the offline physical stores of the enterprise, and online and offline access, browsing and consumption behavior evaluation is performed on the users by using data assets of the enterprise, so that the loyalty classification accuracy of the users is ensured.
In the steps 2 and 3, by acquiring static geographical position information of the enterprise user and geographical position information of the online store of the enterprise, converting the corresponding geographical position information into longitude and latitude by adopting a GIS geographical forward coding technology, and calculating the shortest distance from the user to the online store of the enterprise by adopting an Euclidean distance calculation method;
shortest distance Dmin from user to offline store of enterprise
Di={min(dij)|dij=|Ai,Bj|,i=1,...,n;j=1,...,m}
Wherein Ai represents the longitude and latitude of the static geographic position information of the user, Bj represents the longitude and latitude of an off-line store address of the enterprise, | Ai, Bj | represents the earth surface distance between two points of Ai and Bj.
Further, in the steps 4, 5 and 6, a user loyalty classification index system is constructed, an analytic hierarchy process is adopted to determine the index weight, and the user loyalty score is calculated.
Further, in the steps 6 and 7, before clustering the users, rough clustering is performed by adopting a Canopy algorithm, and the number of clustering centers and the clustering center point of the K-means cluster are preliminarily determined; the coarse clustering step comprises:
firstly, vectorizing an enterprise user data set, putting the vectorized enterprise user data set into a list, and determining a threshold value T according to an experimental method1、T2And T1>T2
Secondly, taking any point P from the list, calculating the distance between the point P and all the Canopy (if no Canopy exists currently, the point P is taken as a Canopy), and if the distance between the point P and a Canopy is within T1, adding the point P into the Canopy;
thirdly, if the distance between the point P and a certain Canopy is within T2, the point P needs to be deleted from list, and the first step is that the point P is considered to be close enough to the Canopy at the moment, so that the point P cannot be used as the center of other canlays any more;
fourthly, repeating the second step and the third step until the list is empty, and ending;
and fifthly, finishing the algorithm, wherein the data of the inner point of Canopy is the data k of the next clustering center.
Further, in the step 8, the step of clustering the loyalty of the user by using the K-means algorithm comprises:
first, with D ═ x1,x2,...xtDenotes a user set, k is the cluster number determined by the canopy algorithm in the previous step, N denotes the maximum iteration number, and C ═ C1,C2,...CkDenotes a divided cluster.
Second, randomly select k samples from the data set D as the initial cluster center { μ }1,μ2,...μk};
Thirdly, for any sample point xi(i ═ 1, 2.. times, t), which are calculated to k cluster centers μ, respectivelyjThe distance (j ═ 1, 2.. times.k) is divided into clusters represented by the center points closest to the distance, and the specific formula for calculating the distance is as follows:
Figure BDA0002834546560000041
the fourth step, the clusters C are alignedjRecalculating the cluster center μ for all sample points in (j ═ 1, 2.. times, k)j(j ═ 1, 2.. times, k), the specific formula is:
Figure BDA0002834546560000042
the fifth step, repeating the third and the fourth steps to k clustering centers mujAnd (j ═ 1, 2.. times, k) performing iterative updating until the clustering center is unchanged, or the maximum iteration number N is reached, or a set fault-tolerant range is reached, considering that a stable state is reached, ending iteration, and outputting a clustering result.
The invention has the beneficial effects that:
according to the method for classifying the loyalty of the users, the static information geographical position information of the users is added into the user data of the loyalty evaluation of the users through a GIS (geographic information system) geographical coding technology, so that the data base of the loyalty evaluation of the users is enhanced, the users are accurately clustered by using a K-means algorithm after rough clustering is carried out on the user data by adopting a Canopy algorithm, and accurate classification of the loyalty of the enterprises is realized on the basis of comprehensively considering all factors influencing the loyalty evaluation of the users.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following briefly introduces the embodiments or the drawings needed to be practical in the prior art description, and obviously, the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without any inventive step, are within the scope of the present invention.
In any embodiment, as shown in fig. 1, a method for classifying user loyalty according to the present invention combines a Canopy algorithm and a K-means algorithm to classify the loyalty of users based on the static geographic location information of users and the user behavior data, and realizes accurate classification of the loyalty of users of enterprises based on comprehensive consideration of all factors affecting the loyalty evaluation of users. The user in the invention refers to a user who has paid at least once in the enterprise and has a specific receiving address. The invention comprises the following steps:
(1) extracting user behavior data: and associating and integrating the online and offline behavior data of the user by taking the mobile phone number of the user as an ID.
Firstly, acquiring historical consumption data and access browsing collection data of a user on an enterprise line, wherein the access browsing collection data of the user is from an online log database of the enterprise, and the consumption data of the user is from a business database of the enterprise; and then acquiring historical consumption data and visiting data of the users under the enterprise line.
Extracting historical browsing and consumption records of the users, and extracting access browsing and consumption purchasing records of the users registered to the online shopping mall/store of the enterprise within the last five years (determined according to analysis time period requirements of the enterprise users), namely extracting user access browsing data stored in a weblog database of the enterprise within the last five years and user consumption purchasing data stored in a business database. Wherein, the user refers to the user who has been consumed at least once in the enterprise and has a complete receiving address.
The part of data in the invention is from the visiting and consumption records of the user collected by the camera installed in the online store of the enterprise by adopting the face recognition technology, so that the data can truly and completely reflect the behavior characteristics of the user in the online store. The two parts of data are associated through the mobile phone number of the user, and the on-line and off-line user behavior data integration is completed.
(2) User static geographical position information and enterprise off-line store geographical position information collection
And extracting the receiving address, the after-sales service address, the delivery and delivery address, the delivery and pick-up address of the user from the enterprise business database, and processing data duplication removal and the like to obtain the static geographic position information data set of the user.
Extracting all offline store addresses from the enterprise store information database to obtain an offline store address data set of the enterprise: extracting a user ID, user consumption time and a user receiving address from a business database of an enterprise, extracting the user ID, after-sale service time and the user after-sale service address from an after-sale database of the user, merging user data by taking the user ID as a unique key, and taking the after-sale service address or a receiving address corresponding to the current nearest consumption time as static geographical position information of the user; extracting an enterprise online store ID and online store geographical position information from an enterprise online store database; and respectively obtaining an enterprise user geographical position information data set and an enterprise store-first data set.
(3) Data cleansing
Cleaning the data according to the integrity, uniqueness, authority, legality and consistency of the data;
and eliminating fields and data which are irrelevant to the loyalty classification index system of the enterprise user, such as testing, page jumping and the like of the enterprise developer, and obtaining a user basic data set which comprises user IDs, VINFOs, access time, checked commodity entries, consumption dates, consumption amounts and the like and is directly relevant to the loyalty classification indexes of the user.
(4) Converting user address into longitude and latitude by GIS geographic coding technology
And (3) according to the user static geographic position information data set collected in the step (2), converting the user position information into latitude and longitude from text data by calling a map Web API interface provided by a map service provider in domestic mainstream and adopting a forward coding technology in GIS geographic coding.
(5) Converting enterprise offline store addresses into longitude and latitude by GIS (geographic information System) geographic coding technology
And (3) according to the online store address data set of the enterprise collected in the step (2), converting the online store address data of the enterprise into latitude and longitude by calling a map Web API interface provided by a mainstream map service provider in China and adopting a forward coding technology in GIS (geographic information system) geographic coding. And calling an API (application programming interface) provided by a mainstream map service provider in China by using python, and converting the static geographical position information of the user and the geographical position information of the enterprise store into longitude and latitude by adopting a forward coding technology in GIS (geographic information system) geographical coding.
(6) Calculating the shortest distance D from each user to the online store of the enterprisemin
And calculating the shortest distance from each user to the online store of the enterprise by adopting an Euclidean distance calculation method.
Di={min(dij)|dij=|Ai,Bj|,i=1,...,n;j=1,...,m}
Wherein A isiLatitude and longitude representing static geographical location information of a user, BjRepresents the latitude and longitude of the store address of the off-line enterprise, | Ai,BjAnd | represents the ground distance between two points Ai and Bj.
(7) Researching the existing loyalty-related documents and the actual conditions of enterprise business, constructing a user loyalty evaluation index system, and extracting a user behavior feature data set;
and combining literature research related to user loyalty segmentation and enterprise business characteristics and process characteristics in the Internet environment, constructing an index system for enterprise user loyalty evaluation, and extracting a user behavior characteristic data set. The index system in the invention comprises 11 user behavior indexes as shown in the following table:
Figure BDA0002834546560000071
the method comprises the steps that users are distinguished by a VINFO field in an access log database, when user access characteristic behaviors are calculated, log tables are connected through the VINFO field, and an SQL program is written to calculate the access characteristics of each user; and distinguishing users by LOGIN _ ID in the enterprise business database, and when the purchasing behavior characteristics of the users are calculated, connecting an enterprise business table through a LOGIN _ ID field, and writing an SQL program to calculate the purchasing behavior characteristics of the users.
(8) Determining a weight index for each feature in an index system by analytic hierarchy process
The first step is as follows: designing questionnaire according to index system, asking experts to objectively judge the same level factor belonging to each factor in the previous layer, namely comparing the indexes with each other by 1-9 scale method, and scoring the relative importance between the indexes to obtain judgment matrix P of the importance of the indexes1,P21,P22In which P is1Is a two-level inter-index importance comparison matrix, P21To access a three-level inter-indicator significance comparison matrix, P, under loyalty dimensions22An importance comparison matrix between three levels of metrics under the loyalty dimension of consumption.
The second step is that: and respectively calculating weight vectors through the judgment matrixes. Setting a judgment matrix P*Within n indices, then pijIs the importance of the ith index relative to the jth index, where i, j ∈ [1, n ∈ ]]And normalizing each column to obtain:
Figure BDA0002834546560000081
wherein, Σ pijIs the sum of the columns, from which a new matrix Q is obtained*. To Q*Summing each row in the process to obtain a feature vector, and obtaining the weight of each index after normalization processing of the feature vector, wherein the specific formula is as follows:
Figure BDA0002834546560000082
and thirdly, respectively carrying out consistency check on each judgment matrix, namely checking the consistency of the matrix by using a consistency index, a random consistency index and a consistency ratio. The specific calculation formula of the consistency ratio CR is:
Figure BDA0002834546560000083
in the above formula, CI represents a consistency index, RI represents a random consistency index, and the specific calculation formula is as follows:
Figure BDA0002834546560000084
in the above formula, λmax(P*) To judge the matrix P*N is the matrix P*Of (c) is calculated. The random consistency index RI is obtained by looking up a table according to the dimensionality of the matrix, and the specific parameters are shown in the following table:
n 1 2 3 4 5 6 7 8 9
RI 0.00 0.00 0.58 0.9 1.12 1.24 1.32 1.41 1.45
the specific judgment criteria for whether the consistency check passes or not are as follows: when CR is less than 0.1, the judgment matrix is considered to pass consistency test, and the normalized characteristic vector of the judgment matrix is used as a weight vector; otherwise, the decision matrix needs to be readjusted until the consistency check is passed. The specific adjustment method is to adopt a maximum deviation term correction method to reconstruct a judgment matrix, and the method is described as follows:
according to the judgment matrix P·Is given by (W)1,w2,...,wn)TReconstructing a decision matrix R*=(rij)=(wi/wj) Calculating a deviation matrix:
Δ=(δij)=(|pij-rij|)
for deltaijP corresponding to the maximum termijMake a correction to pij=rij,pji=rjiSubstituting into the original matrix P*And forming a new judgment matrix. By adjusting step by step according to the steps, the consistency is continuously improved until the requirement is met. It should be noted that the numerical meanings of the scale 1-9 mentioned in the expert score are shown in the following table:
Figure BDA0002834546560000091
(9) normalizing user index data max-min to obtain user loyalty index
Step one, normalizing user index data according to a max-min normalization formula; the formula is as follows:
Figure BDA0002834546560000092
wherein, XiIndicating the original value, X, of a certain index of the user before normalizationmaxRepresenting the maximum value, X, of a certain index of the user before normalizationminRepresents the minimum value, X 'of a certain index of the user before normalization'iA normalized value of some indicator representing the user.
Secondly, calculating the loyalty index of each user according to the weight of each feature obtained after the consistency check, wherein the specific calculation method comprises the following steps:
loyalt=αvisitt+βpurchaset+γdistancet
wherein loyaltLoyalty points, visit, indicating user ttExpress access loyalty points, purchasasestRepresenting a consumption loyalty score, α and β corresponding to the weights of access loyalty and consumption loyalty, respectively; wherein visittAnd purchassetThe calculation formula of (2) is as follows:
visitt=α1A1t2A2t+...+αmAmt
purchaset=β1B1t2B2t+...+βnBnt
wherein A isi(i ═ 1, 2,. m) and Bj(j ═ 1, 2.. n) respectively represents user access behavior characteristics and consumption behavior characteristics, namely three-level indexes under access loyalty and consumption loyalty dimensions screened after the characteristics are selected; alpha is alphai(i ═ 1, 2,. m) and βj(j ═ 1, 2,. n) represents the weight of each behavioral characteristic.
(10) Coarse clustering of users by using canopy algorithm
Firstly, vectorizing an enterprise user data set, putting the vectorized enterprise user data set into a list, and determining a threshold value T according to an experimental method1、T2And T1>T2
Secondly, taking any point P from the list, calculating the distance between the point P and all the Canopy (if no Canopy exists currently, the point P is taken as a Canopy), and if the distance between the point P and a Canopy is within T1, adding the point P into the Canopy;
thirdly, if the distance between the point P and a certain Canopy is within T2, the point P needs to be deleted from list, and the first step is that the point P is considered to be close enough to the Canopy at the moment, so that the point P cannot be used as the center of other canlays any more;
fourthly, repeating the second step and the third step until the list is empty, and ending;
and fifthly, finishing the algorithm, wherein the data of the inner point of Canopy is the data k of the next clustering center.
(11) Loyalty clustering for users using K-means algorithm
First, with D ═ x1,x2,...xtDenotes a user set, k is the cluster number determined by the canopy algorithm in the previous step, N denotes the maximum iteration number, and C ═ C1,C2,...CkMeans a strokeAnd (4) clustering.
Second, randomly select k samples from the data set D as the initial cluster center { μ }1,μ2,...μk};
Thirdly, for any sample point xi(i ═ 1, 2.. times, t), which are calculated to k cluster centers μ, respectivelyjThe distance (j ═ 1, 2.. times.k) is divided into clusters represented by the center points closest to the distance, and the specific formula for calculating the distance is as follows:
Figure BDA0002834546560000101
the fourth step, the clusters C are alignedjRecalculating the cluster center μ for all sample points in (j ═ 1, 2.. times, k)j(j ═ 1, 2.. times, k), the specific formula is:
Figure BDA0002834546560000111
the fifth step, repeating the third and the fourth steps to k clustering centers mujAnd (j ═ 1, 2.. times, k) performing iterative updating until the clustering center is unchanged, or the maximum iteration number N is reached, or a set fault-tolerant range is reached, considering that a stable state is reached, ending iteration, and outputting a clustering result.
Before clustering is carried out on users, a Canopy algorithm is adopted for rough clustering, the number of clustering centers and clustering center points of K-means clustering are preliminarily determined, the complexity of distances among calculation points in the K-means clustering algorithm is effectively reduced, the memory cost is reduced, the obvious time advantage is achieved, meanwhile, the anti-interference capability of the K-means algorithm on isolated points is enhanced, and the accuracy of loyalty classification of the users is improved.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims. It should be noted that the various technical features described in the above embodiments can be combined in any suitable manner without contradiction, and the invention is not described in any way for the possible combinations in order to avoid unnecessary repetition. In addition, any combination of the various embodiments of the present invention is also possible, and the same should be considered as the disclosure of the present invention as long as it does not depart from the spirit of the present invention.

Claims (6)

1. A method of user loyalty classification, comprising the steps of:
step 1: extracting user behavior data: extracting user access browsing and consumption purchasing data from databases of online stores/malls and online stores of enterprises;
step 2: collecting static geographic position information of a user and geographic position information of an offline store of an enterprise: connecting the receiving address and the after-sales service address of the enterprise user to extract the static address information of the user, and extracting the geographical position information of the online store of the enterprise from the online store database of the enterprise;
and step 3: converting the corresponding geographical position information into longitude and latitude through the static geographical position information of the user and the position information of the online store of the enterprise, which are collected in the step 2, by a GIS geographical forward coding technology, and calculating the shortest distance from the user to the online store of the enterprise by adopting an Euclidean distance calculation method;
and 4, step 4: researching existing loyalty related documents and enterprise business actual conditions, constructing a user loyalty evaluation index system, and extracting a user behavior feature data set;
and 5: determining the weight index of each feature in the index system by an analytic hierarchy process;
step 6: normalizing the user index data extracted in the step 3 to calculate the loyalty index;
and 7: preliminarily determining the number of user loyalty clustering clusters by adopting a canty algorithm;
and 8: and carrying out loyalty clustering on the users by utilizing a K-means algorithm.
2. The method as claimed in claim 1, wherein in step 1, the loyalty of the enterprise user is classified based on all data collected by the online and offline brick-and-mortar stores of the enterprise, and the data assets of the enterprise are used to evaluate the behavior of the user based on online and offline access, browsing and consumption, so as to ensure the accuracy of the loyalty classification of the user.
3. The method as claimed in claim 1, wherein in steps 2 and 3, by obtaining static geographical location information of the enterprise user and geographical location information of the online store of the enterprise, the corresponding geographical location information is converted into longitude and latitude by using a GIS geographical forward coding technique, and the shortest distance from the user to the online store of the enterprise is calculated by using a euclidean distance calculation method.
Shortest distance Dmin from user to offline store of enterprise
Di={min(dij)|dij=|Ai,Bj|,i=1,...,n;j=1,...,m}
Wherein Ai represents the longitude and latitude of the static geographic position information of the user, Bj represents the longitude and latitude of an off-line store address of the enterprise, | Ai, Bj | represents the earth surface distance between two points of Ai and Bj.
4. The method as claimed in claim 1, wherein in the steps 4, 5 and 6, a user loyalty classification index system is constructed, an analytic hierarchy process is used to determine the index weight, and the user loyalty score is calculated.
5. The method according to claim 1, wherein in steps 6 and 7, before clustering, rough clustering is performed on the users by using a Canopy algorithm to preliminarily determine the number of clustering centers and the clustering center point of the K-means cluster; the coarse clustering step comprises:
first, the rabbetVectorizing the business user data set, putting the vectorized business user data set into a list, and determining a threshold value T according to an experimental method1、T2And T1>T2
Secondly, taking any point P from the list, calculating the distance between the point P and all the Canopy if the distance between the point P and a Canopy is within T1, and adding the point P into the Canopy;
thirdly, if the distance between the point P and a certain Canopy is within T2, the point P needs to be deleted from list, and the first step is that the point P is considered to be close enough to the Canopy at the moment, so that the point P cannot be used as the center of other canlays any more;
fourthly, repeating the second step and the third step until the list is empty, and ending;
and fifthly, finishing the algorithm, wherein the data of the inner point of Canopy is the data k of the next clustering center.
6. The method of claim 1, wherein in step 8, clustering loyalty of users using a K-means algorithm comprises:
first, with D ═ x1,x2,...xtDenotes a user set, k is the cluster number determined by the canopy algorithm in step 7, N denotes the maximum iteration number, and C ═ C1,C2,...CkDenotes a divided cluster;
second, randomly select k samples from the data set D as the initial cluster center { μ }1,μ2,...μk};
Thirdly, for any sample point xi(i ═ 1, 2.. times, t), which are calculated to k cluster centers μ, respectivelyjThe distance (j ═ 1, 2.. times.k) is divided into clusters represented by the center points closest to the distance, and the specific formula for calculating the distance is as follows:
Figure FDA0002834546550000031
fourth step of clustering Cj(j=1,2,.. k) recalculating the cluster center μ for all sample pointsjThe specific formula (j ═ 1, 2.., k) is:
Figure FDA0002834546550000032
the fifth step, repeating the third and the fourth steps to k clustering centers mujAnd (j ═ 1, 2.. times, k) performing iterative updating until the clustering center is unchanged, or the maximum iteration number N is reached, or a set fault-tolerant range is reached, considering that a stable state is reached, ending iteration, and outputting a clustering result.
CN202011468575.8A 2020-12-14 2020-12-14 Method for classifying loyalty of users Pending CN112435078A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011468575.8A CN112435078A (en) 2020-12-14 2020-12-14 Method for classifying loyalty of users

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011468575.8A CN112435078A (en) 2020-12-14 2020-12-14 Method for classifying loyalty of users

Publications (1)

Publication Number Publication Date
CN112435078A true CN112435078A (en) 2021-03-02

Family

ID=74691444

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011468575.8A Pending CN112435078A (en) 2020-12-14 2020-12-14 Method for classifying loyalty of users

Country Status (1)

Country Link
CN (1) CN112435078A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115237876A (en) * 2022-05-16 2022-10-25 中航信移动科技有限公司 Flight user classification method, electronic device and computer-readable storage medium
CN116090891A (en) * 2023-01-10 2023-05-09 扬州广源集团有限公司 Big data-based power construction enterprise customer behavior analysis method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654392A (en) * 2015-11-26 2016-06-08 国家电网公司 Familial defect analysis method of equipment based on clustering algorithm
CN111091282A (en) * 2019-12-10 2020-05-01 焦点科技股份有限公司 Customer loyalty segmentation method based on user behavior data
CN111385355A (en) * 2020-03-03 2020-07-07 上海万位数字技术有限公司 Method for improving 4S shop maintenance and arrival rate based on location service

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654392A (en) * 2015-11-26 2016-06-08 国家电网公司 Familial defect analysis method of equipment based on clustering algorithm
CN111091282A (en) * 2019-12-10 2020-05-01 焦点科技股份有限公司 Customer loyalty segmentation method based on user behavior data
CN111385355A (en) * 2020-03-03 2020-07-07 上海万位数字技术有限公司 Method for improving 4S shop maintenance and arrival rate based on location service

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115237876A (en) * 2022-05-16 2022-10-25 中航信移动科技有限公司 Flight user classification method, electronic device and computer-readable storage medium
CN116090891A (en) * 2023-01-10 2023-05-09 扬州广源集团有限公司 Big data-based power construction enterprise customer behavior analysis method

Similar Documents

Publication Publication Date Title
US20230281652A1 (en) System, method and computer program product for geo-specific vehicle pricing
RU2510891C2 (en) Method and device for system used for forecasting of group trade
US20170364933A1 (en) User maintenance system and method
US6493723B1 (en) Method and system for integrating spatial analysis and data mining analysis to ascertain warranty issues associated with transportation products
WO2018192348A1 (en) Data processing method and device, and server
CN112990386B (en) User value clustering method and device, computer equipment and storage medium
CN111488385B (en) Data processing method and device based on artificial intelligence and computer equipment
US20190080352A1 (en) Segment Extension Based on Lookalike Selection
CN112435078A (en) Method for classifying loyalty of users
CN111639690A (en) Fraud analysis method, system, medium, and apparatus based on relational graph learning
Alves Gomes et al. A review on customer segmentation methods for personalized customer targeting in e-commerce use cases
CN110728301A (en) Credit scoring method, device, terminal and storage medium for individual user
CN106372964A (en) Behavior loyalty identification and management method, system and terminal
CN115063224A (en) Service auditing method, device and equipment based on user portrait and storage medium
CN112990989B (en) Value prediction model input data generation method, device, equipment and medium
CN114693409A (en) Product matching method, device, computer equipment, storage medium and program product
Yan et al. An integrated method based on hesitant fuzzy theory and RFM model to insurance customers’ segmentation and lifetime value determination
CN113450004A (en) Power credit report generation method and device, electronic equipment and readable storage medium
CN107622409B (en) Method and device for predicting vehicle purchasing capacity
CN116821759A (en) Identification prediction method and device for category labels, processor and electronic equipment
CN115841345A (en) Cross-border big data intelligent analysis method, system and storage medium
CN109902129A (en) Insurance agent's classifying method and relevant device based on big data analysis
Lee et al. A comparison of the predictive powers of tenure choices between property ownership and renting
CN114548620A (en) Logistics punctual insurance service recommendation method and device, computer equipment and storage medium
CN112001742A (en) Website visitor behavior habit evaluation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210302