CN111666351A

CN111666351A - Fuzzy clustering system based on user behavior data

Info

Publication number: CN111666351A
Application number: CN202010476681.4A
Authority: CN
Inventors: 陈亚娟; 龙泳先
Original assignee: Beijing Ruizhi Tuyuan Technology Co ltd
Current assignee: Beijing Ruizhi Tuyuan Technology Co ltd
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2020-09-15

Abstract

The invention discloses a fuzzy clustering system based on user behavior data, and relates to the technical field of wireless internet behavior analysis and prediction; in order to construct a profit model for accurate marketing; the data acquisition module is used for collecting user behavior data of user running time and sending the user behavior data to the server, and the user behavior data comprises static data and dynamic data. According to the invention, after user classification is obtained, long-term behavior prediction aiming at user classification and short-term behavior correlation aiming at individual behavior are obtained by data mining, the accuracy of real-time behavior prediction is continuously updated and perfected at the running time, the capacity of equipment can be flexibly expanded, the performance of the equipment can be improved, the flexibility of technology upgrading and equipment updating is provided, the flexibility of expansion, adjustment and reconstruction of service functions is provided, the requirements and preferences of customers are known, and the browsing and interaction behavior data of the customers are emphasized.

Description

Fuzzy clustering system based on user behavior data

Technical Field

The invention relates to the technical field of wireless internet behavior analysis and prediction, in particular to a fuzzy clustering system based on user behavior data.

Background

With the wide application of 3G technology and the emergence of various intelligent mobile terminals, wireless internet users have a rapidly rising trend, wherein mobile phone applications are the most important parts of smart phones and also show considerable development situations, with the development of data analysis technology and intelligent storage technology, a large amount of behavior data generated by APP user groups can be stored, through deep excavation and processing of the mass data, behavior habits and preference characteristics of the users can be obtained, habits, behaviors and preferences of the users can be predicted, through deep analysis of user behaviors of the wireless internet, the real requirements of the users can be known, network resources are fully utilized, relevant information concerned by the users is provided, user experience and loyalty are improved, a better profit mode is further constructed, and the current wireless internet user behavior prediction also belongs to a newer research field, there is no more perfect solution.

Through retrieval, a patent with the Chinese patent application number of CN201910827753.2 discloses a prediction method, a system and a storage medium based on fuzzy clustering and a BP neural network, belonging to the technical field of the prediction of scenic spot passenger flow. The prediction method comprises the following steps: obtaining historical passenger flow volume data, historical e-commerce ticket booking data, historical air temperature data and historical weather data of scenic spots; and performing correlation analysis by taking a preset time period as a unit to obtain a key factor matrix. The prediction method, system and storage medium based on fuzzy clustering and BP neural network in the above patent have the following disadvantages: how to carry out accurate marketing to wireless internet users becomes an urgent problem in front of operators and mobile websites.

Disclosure of Invention

The invention aims to solve the defects in the prior art, and provides a fuzzy clustering system based on user behavior data.

In order to achieve the purpose, the invention adopts the following technical scheme:

the fuzzy clustering system based on the user behavior data comprises a data acquisition module, a data analysis module and an output unit, wherein the data acquisition module is used for collecting the user behavior data of user operation time and sending the user behavior data to a server, and the user behavior data comprises static data and dynamic data; the data analysis module comprises a data extraction unit, a data preprocessing unit, a model algorithm unit and a fuzzy clustering analysis unit, wherein the model algorithm unit comprises a k-means clustering algorithm, and the k-means clustering algorithm is a clustering analysis algorithm for iterative solution; the output unit comprises a customer image and a visual interface.

Preferably: the user behavior data needs to consider three data dimensions, namely time, frequency and a result, which are used for labeling setting, the time dimension mainly relates to a time period and a duration length of behavior occurrence, the time period data is used for selecting a time range of target equipment, marketing analysis and marketing promotion can also be used for wind control and anti-fraud, the duration mainly relates to a behavior occurrence process, and starting and ending time points of the behavior are recorded.

Preferably: the data acquisition module mainly adopts an SDK mode to acquire data, the SDK is a few lines of codes, the type of the acquired data also depends on the position of a data buried point and is used for returning parameters, and meanwhile, the data acquisition module can also acquire the behavior of a client on an App page, such as clicking.

Preferably: the data burying points can be collected, researched and counted in the background of the data burying points, and can also be carried out through a third-party data analysis platform.

Preferably: the data embedding method comprises the following steps:

s1: defining data needing to be counted, and burying points according to the data needing to be counted;

s2: combing the data of the points to be buried and confirming the rationality;

s3: using a third-party data analysis platform, after embedding a point in an APP, uploading corresponding event ID and event name related information on the third-party platform, and making the ID and the name in a code consistent;

s4: after data embedding is completed, if the event conversion rate needs to be statistically analyzed, a funnel model needs to be added in advance, and data statistics can be started the next day after the funnel model is added.

Preferably: the data extraction unit includes the steps of:

s11: data extraction unit requirements are carried out through json, and the reasonability of the data extraction unit requirements is verified;

s12: through xpath, quickly positioning specific elements, acquiring power saving information, analyzing, extracting and presenting data, and verifying a data flow;

s13: and analyzing the data reasonability to obtain a behavior analysis result.

Preferably: the data preprocessing unit comprises a data cleaning technology which can be used for cleaning noise in data and correcting inconsistency; merging data from multiple data sources into a coherent data store, such as data integration techniques for data bins; data reduction techniques that can reduce the size of data by, for example, sniping, deleting redundant features, or clustering; data transformation techniques that can be used to compress data to a smaller interval, such as 0.0 to 1.0.

Preferably, the step of removing noise in the data is as follows:

step a1, constructing the user behavior data according to the following formula:

wherein X represents the total data of user behaviors, X₁Represents time, x₂Representing the frequency, wherein m represents the number of constructed user behavior total data;

step a2, a threshold between the noise data value and the normal data value is found according to the following formula:

wherein, mu (a, b) represents the mean value of the user behavior data in the neighborhood, s (a, b) represents the standard deviation of the user behavior data in the neighborhood, R is the dynamic range of the standard deviation, a correction parameter of l, and x_i,jRepresenting the user behavior data value with the abscissa of i and the ordinate of j, m representing the total number of the constructed user behavior data, 2m representing the constructed user behaviorIs the number of total data values;

step a3, finding the median of the user behavior data according to the following formula:

f(a,b)＝MED(X)

wherein X represents total user behavior data, and f (a, b) represents a median value in the user behavior data;

and step A4, according to the threshold q between the noise data value and the normal data value obtained in the step A2, the noise point which is larger than the threshold q is the noise point, and the noise point data value is replaced by the median of the user behavior data obtained in the step A3, so that the noise removal is completed.

Preferably: the k-means clustering algorithm comprises the following steps:

s31: randomly selecting k objects as initial clustering centers;

s32: calculating the distance between each object and each seed clustering center;

s33: each object is assigned to the cluster center closest to it.

Preferably: the fuzzy clustering analysis unit is divided into a classification method based on fuzzy relation, a fuzzy clustering algorithm based on target function and a fuzzy clustering algorithm based on neural network, and the classification method based on fuzzy relation comprises a system clustering method, a clustering algorithm based on equivalent relation, a clustering algorithm based on similar relation and a graph theory clustering algorithm.

Preferably: the client representation is the basis of user experience, typical characteristics of a client are described for an Internet application, then the client is abstracted into a person, and the person is used for describing the person; for product design, on the basis of establishing a user portrait, the behavior of a typical user is researched more deeply, the typical user is concentrated on first, the requirement of the typical user is met, and then user expansion is carried out.

The invention has the beneficial effects that: the data acquisition module collects user behavior data of user operation time, the data analysis module acquires the data to establish reasonable mobile phone users and behavior models thereof, covers natural and social attributes of the users and multi-latitude behavior attributes of the users in the internet surfing process, after examination and screening, the user classification is carried out by adopting a method of a model algorithm unit and a fuzzy clustering analysis unit, the influence on behavior prediction caused by inaccurate subjective classification is avoided, the user model is optimized, long-term behavior prediction aiming at user categories and short-term behavior correlation aiming at individual behaviors are obtained by data mining after the user classification is obtained, the accuracy of the real-time behavior prediction is continuously updated and perfected in the operation time, the capacity of equipment can be flexibly expanded and the performance of the equipment can be improved, the flexibility of technology upgrading and equipment updating is realized, and the expansion of supporting service functions and the performance of the equipment are realized, The flexibility of adjustment and reconstruction, the requirements and the preferences of customers are known, and the browsing and interaction behavior data of the customers are emphasized.

Drawings

FIG. 1 is a schematic view of a flow structure of a fuzzy clustering system based on user behavior data according to the present invention;

fig. 2 is a schematic diagram of a k-means clustering algorithm of the fuzzy clustering system based on user behavior data according to the present invention.

Detailed Description

The technical solution of the present patent will be described in further detail with reference to the following embodiments.

Reference will now be made in detail to embodiments of the present patent, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present patent and are not to be construed as limiting the present patent.

In the description of this patent, it is to be understood that the terms "center," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in the orientations and positional relationships indicated in the drawings for the convenience of describing the patent and for the simplicity of description, and are not intended to indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and are not to be considered limiting of the patent.

In the description of this patent, it is noted that unless otherwise specifically stated or limited, the terms "mounted," "connected," and "disposed" are to be construed broadly and can include, for example, fixedly connected, disposed, detachably connected, disposed, or integrally connected and disposed. The specific meaning of the above terms in this patent may be understood by those of ordinary skill in the art as appropriate.

Example 1:

the fuzzy clustering system based on the user behavior data, as shown in fig. 1 and fig. 2, comprises a data acquisition module, a data analysis module and an output unit, wherein the data acquisition module is used for collecting the user behavior data of the user running time and sending the user behavior data to a server, and the user behavior data comprises static data and dynamic data; the data analysis module comprises a data extraction unit, a data preprocessing unit, a model algorithm unit and a fuzzy clustering analysis unit, wherein the model algorithm unit comprises a k-means clustering algorithm, and the k-means clustering algorithm is a clustering analysis algorithm for iterative solution; the output unit comprises a customer image and a visual interface.

Further, the static data includes characteristics of the user as a natural person, such as age, gender, region, education degree, and the like, the static data is obtained and related to a sample source, and the dynamic data includes behavior characteristics of the user in the process of logging in the internet by using the mobile phone, such as browsing webpage category, staying time, reading habits, webpage characteristics, and the like.

The user behavior data needs to consider three data dimensions, namely time, frequency and a result, which are used for labeling setting, the time dimension mainly relates to a time period and a duration length of behavior occurrence, the time period data is used for selecting a time range of target equipment, marketing analysis and marketing promotion can also be used for wind control and anti-fraud, the duration mainly relates to a behavior occurrence process, and starting and ending time points of the behavior are recorded.

Furthermore, the frequency dimension mainly focuses on the occurrence frequency and trend of some specific behaviors, wherein the frequency and the interest of a client have a large positive correlation, the number of clicks and the number of browsing are positively correlated with the purchasing demand of the client in a certain period of time, the frequency can be used for marketing after tagging, the client which does not appear is identified, the frequency can also be used for analyzing the experience of the client and analyzing products, the experience of the products and the needs of the client can be known through thermodynamic diagrams, and the optimization of the internal layout of the App and the sales of related products can also be carried out.

Further, the result is used for labeling, setting main attention to whether the buying and selling are finished or not, judging the result of clicking and browsing by the client, dividing result data into transaction and non-transaction, acquiring filled numerical values based on business needs and further applying the filled numerical values, wherein transaction data in the result data can be used for experience analysis of products, experience analysis of the client, channel ROI analysis and the like, the non-transaction data can be used for secondary marketing, potential clients are sold again, and comprehensive analysis needs to be carried out by combining time period, duration and frequency data during the secondary marketing, so that a target customer group is screened out for marketing.

The data acquisition module mainly adopts an SDK mode to acquire data, the SDK is a few lines of codes, the type of the acquired data also depends on the position of a data buried point and is used for returning parameters, and meanwhile, the data acquisition module can also acquire the behavior of a client on an App page, such as clicking.

Furthermore, any data collected by the SDK mode is based on subjective wishes of customers, whether the data relate to personal privacy data can be distinguished from the SDK embedded point, and the personal privacy data comprise 7 data types in the PII, such as social security numbers, mobile phone numbers, home addresses, private postcodes and the like.

Furthermore, the data embedding point enables related personnel such as products or operations to perform customized statistics on user data according to specific requirements, for example, when a behavior mode of a user is required to be tracked, or a page related click condition and a key path conversion condition are observed, and when an activity effect of a certain event is analyzed, the data embedding point needs to be performed in advance, and corresponding data can be observed after an APP is on line, and then investigation and analysis are performed.

The data burying points can be collected, researched and counted in the background of the data burying points, and can also be carried out through a third-party data analysis platform.

The data embedding method comprises the following steps:

s2: combing the data of the points to be buried and confirming the rationality;

s3: using a third-party data analysis platform, after embedding a point in an APP, uploading corresponding event ID, event name and other related information on the third-party platform, and making the ID and the name in a code consistent;

Further, the ID and the name are generally arranged and named on the product side, and the iOS and the Android are unified.

The data extraction unit includes the steps of:

s13: and analyzing the data reasonability to obtain a behavior analysis result.

The data preprocessing unit comprises a data cleaning technology which can be used for cleaning noise in data and correcting inconsistency; merging data from multiple data sources into a coherent data store, such as data integration techniques for data bins; data reduction techniques that can reduce the size of data by, for example, sniping, deleting redundant features, or clustering; data transformation techniques that can be used to compress data into smaller intervals, such as 0.0 to 1.0, can improve the accuracy and efficiency of mining algorithms that design distance metrics.

The step of removing noise in the data is as follows:

wherein, mu (a, b) represents the mean value of the user behavior data in the neighborhood, s (a, b) represents the standard deviation of the user behavior data in the neighborhood, R is the dynamic range of the standard deviation, a correction parameter of l, and x_i,jRepresenting the user behavior data value with the abscissa of i and the ordinate of j, wherein m represents the number of the constructed user behavior total data, and 2m represents the number of the constructed user behavior total data values;

f(a,b)＝MED(X)

Has the advantages that: the algorithm adopts an image processing algorithm to create user behavior data, wherein noise values in the user behavior data are processed, the noise values in the user behavior data are found out by calculating threshold values, the noise values are removed by adopting median processing, and the optimal user behavior data are provided for the training of a k-means clustering model in the later period.

The k-means clustering algorithm comprises the following steps:

s31: randomly selecting k objects as initial clustering centers;

s33: each object is assigned to the cluster center closest to it.

Further, the cluster centers and the objects assigned to them correspond to a cluster, and the cluster centers are recalculated according to the objects existing in the cluster when a sample is assigned, and the process is repeated until a specific condition is met, wherein the termination condition can be that no (or minimum number of) objects are reassigned to different clusters or no (or minimum number of) cluster centers are changed, or that the square error sum is a local minimum.

Further, the selection of the k value in S31 is generally determined according to actual requirements, or the k value, a measure of distance, is directly given when the algorithm is implemented: given sample χⁱ＝{χ₁,χ₂,...,χ_nThe ones with^j{χ₁,χ₂,...,χ_nWhere i, j-1, 2, and n is the number of samples, update the cluster center: for each divided cluster, calculating the mean value of the sample points in each cluster, and taking the mean value as a new cluster center, wherein the k-means algorithm process is as follows:

inputting: training data set D ═ χ⁽¹⁾,χ⁽²⁾,...χ^(m)The number of clusters k;

the process is as follows: the function kMeans (D, k, maxIter);

randomly select K samples from D as the initial "cluster center" vector μ⁽¹⁾,μ⁽²⁾,...,μ^(k)；

Let C_i＝φ(1≤i≤k)；

j＝1，2，...，m；

Calculating sample χ^(j)With each "cluster center" vector mu⁽ⁱ⁾(i is more than or equal to 1 and less than or equal to k);

determining χ according to the nearest cluster center vector^(j)Cluster mark of (2)_j＝argmin_{i∈{1,2,...,k}dj}；_i

Subjecting the sample to X^(j)Sliding into the corresponding cluster C_λj＝C_λj∪{χ^(j)}；

i＝1，2，...，k；

Computing a new "cluster center" vector

(μ⁽ⁱ⁾)'＝μ⁽ⁱ⁾

Vector mu of current' cluster center⁽ⁱ⁾Is updated to (mu)⁽ⁱ⁾)'；

Keeping the current mean vector unchanged;

1 none of the current "cluster center" vectors are updated;

and (3) outputting: cluster division C ═ C₁,C₂,...,C_k；

In order to avoid an excessively long running time, a maximum running time or a minimum adjustment threshold is usually set, and if the maximum number of rounds is reached or the adjustment amplitude is smaller than the threshold, the running is stopped.

The fuzzy clustering analysis unit is divided into a classification method based on fuzzy relation, a fuzzy clustering algorithm based on target function and a fuzzy clustering algorithm based on neural network, and the classification method based on fuzzy relation comprises a system clustering method, a clustering algorithm based on equivalent relation, a clustering algorithm based on similar relation, a graph theory clustering algorithm and the like.

Further, the fuzzy relation-based classification method is that the clustered samples or variables are respectively considered as a group, then the similarity of the statistical aspects between the classes is determined, two or a plurality of the closest classes are selected and combined into a new class, the similarity of the statistical aspect between the new class and other classes is calculated, then the two or a plurality of the closest groups are selected and combined into a new class, and the method is terminated until all the samples or variables are combined into one class.

Further, the fuzzy clustering algorithm based on the objective function summarizes the clustering analysis into a nonlinear programming problem with constraints, optimal division and clustering of the data set are obtained through optimization solution, the step-by-step clustering method is a fuzzy clustering analysis unit method based on the fuzzy division, and can be summarized as that samples to be classified are determined in advance and are divided into several classes, then the samples are classified again according to the optimization principle, and the classification is ended after multiple iterations until the classification is reasonable.

Further, the fuzzy clustering algorithm based on the neural network is to adopt a competitive learning algorithm to guide the clustering process of the network.

The client representation is the basis of user experience, typical characteristics of a client are described for an Internet application, then the client is abstracted into a person, and the person is used for describing the person; for product design, on the basis of establishing a user portrait, the behavior of a typical user is researched more deeply, the typical user is concentrated on first, the requirement of the typical user is met, and then user expansion is carried out.

The visual interface comprises an agile visual mode, and an agile visual analysis application development mode for agile and iterative analysis of the fairy tale can quickly meet the visual analysis requirements of the client, and the business value of the client is maximized by improving the delivery success rate of the visual analysis system.

When the embodiment is used, the data acquisition module collects user behavior data of user running time, the data analysis module acquires the data to establish reasonable mobile phone users and behavior models thereof, the natural and social attributes of the users and the multi-latitude behavior attributes of the users in the internet surfing process are covered, after screening, the user classification is carried out by adopting a method of a model algorithm unit and a fuzzy clustering analysis unit, the influence on behavior prediction caused by inaccurate subjective classification is avoided, the user model is optimized, long-term behavior prediction aiming at user categories and short-term behavior correlation aiming at individual behaviors are obtained by data mining after the user classification is obtained, the accuracy of real-time behavior prediction is continuously updated and perfected in the running time, the equipment capacity can be flexibly expanded, the equipment performance is improved, and the flexibility of technology upgrading and equipment updating is provided, the method has the flexibility of supporting the expansion, adjustment and reconstruction of the service function, knows the requirements and the preference of the client and attaches importance to the browsing and interaction behavior data of the client.

Example 2:

The data embedding method comprises the following steps:

s2: combing the data of the points to be buried and confirming the rationality;

The data extraction unit includes the steps of:

s13: and analyzing the data reasonability to obtain a behavior analysis result.

The k-means clustering algorithm comprises the following steps:

s31: randomly selecting k objects as initial clustering centers;

s33: each object is assigned to the cluster center closest to it.

the process is as follows: the function kMeans (D, k, maxIter);

Let C_i＝φ(1≤i≤k)；

j＝1，2，...，m；

determining χ according to the nearest cluster center vector^(j)Cluster mark of (2)_j＝argmin_{i∈{1,2,...,k}dji}；

i＝1，2，...，k；

Computing a new "cluster center" vector

(μ⁽ⁱ⁾)'＝μ⁽ⁱ⁾

Vector mu of current' cluster center⁽ⁱ⁾Is updated to (mu)⁽ⁱ⁾)'；

Keeping the current mean vector unchanged;

1 none of the current "cluster center" vectors are updated;

and (3) outputting: cluster division C ═ C₁,C₂,...,C_k；

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims

1. The fuzzy clustering system based on the user behavior data comprises a data acquisition module, a data analysis module and an output unit, and is characterized in that the data acquisition module is used for collecting the user behavior data of user operation time and sending the user behavior data to a server, and the user behavior data comprises static data and dynamic data; the data analysis module comprises a data extraction unit, a data preprocessing unit, a model algorithm unit and a fuzzy clustering analysis unit, wherein the model algorithm unit comprises a k-means clustering algorithm, and the k-means clustering algorithm is a clustering analysis algorithm for iterative solution; the output unit comprises a customer image and a visual interface.

2. The fuzzy clustering system based on user behavior data as claimed in claim 1, wherein the user behavior data needs to consider three data dimensions, which are time, frequency, and result for tagging, and the time dimension mainly relates to the time period and duration length of behavior occurrence, wherein the time period data is used for selecting the time range of the target device, marketing analysis and marketing promotion, and also can be used for wind control and anti-fraud, and the duration mainly relates to the process of behavior occurrence, and records the time points of behavior start and end.

3. The fuzzy clustering system based on user behavior data as claimed in claim 1, wherein the data collection module mainly uses SDK to collect data, where SDK is a few lines of codes, the type of collected data also depends on the position of data embedded point for returning parameters, and also collects the behavior of the client on App page, such as clicking.

4. The fuzzy clustering system based on user behavior data as claimed in claim 3, wherein the data burial point can be collected, researched and counted in the background of the user, or can be performed by a third-party data analysis platform.

5. The fuzzy clustering system based on user behavior data as claimed in claim 4, wherein the data embedding comprises the steps of:

s2: combing the data of the points to be buried and confirming the rationality;

6. The fuzzy clustering system based on user behavior data as claimed in claim 2, wherein the data extraction unit comprises the steps of:

s13: and analyzing the data reasonability to obtain a behavior analysis result.

7. The fuzzy clustering system based on user behavior data as claimed in claim 6, wherein the data preprocessing unit comprises a data cleaning technique which can be used to clean noise in data and correct inconsistency; merging data from multiple data sources into a coherent data store, such as data integration techniques for data bins; data reduction techniques that can reduce the size of data by, for example, sniping, deleting redundant features, or clustering; data transformation techniques that can be used to compress data to a smaller interval, such as 0.0 to 1.0.

8. The fuzzy clustering system based on user behavior data as claimed in claim 7, wherein the step of removing noise in the data is as follows:

f(a,b)＝MED(X)

9. The fuzzy clustering system based on user behavior data as claimed in claim 7, wherein the k-means clustering algorithm comprises the following steps:

s31: randomly selecting k objects as initial clustering centers;

s33: each object is assigned to the cluster center closest to it.

10. The fuzzy clustering system based on user behavior data as claimed in claim 1, wherein the fuzzy clustering analysis unit is divided into fuzzy relation-based classification methods including systematic clustering, equivalence relation-based clustering, similarity relation-based clustering and graph theory clustering, objective function-based fuzzy clustering and neural network-based fuzzy clustering.

11. The fuzzy clustering system based on user behavior data as claimed in claim 10, wherein the customer representation is the basis of user experience, and typical features of customers are described for internet applications, and then such customers are abstracted into a person, and then such person is described; for product design, on the basis of establishing a user portrait, the behavior of a typical user is researched more deeply, the typical user is concentrated on first, the requirement of the typical user is met, and then user expansion is carried out.