CN111666351A - Fuzzy clustering system based on user behavior data - Google Patents

Fuzzy clustering system based on user behavior data Download PDF

Info

Publication number
CN111666351A
CN111666351A CN202010476681.4A CN202010476681A CN111666351A CN 111666351 A CN111666351 A CN 111666351A CN 202010476681 A CN202010476681 A CN 202010476681A CN 111666351 A CN111666351 A CN 111666351A
Authority
CN
China
Prior art keywords
data
user behavior
user
clustering
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010476681.4A
Other languages
Chinese (zh)
Inventor
陈亚娟
龙泳先
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ruizhi Tuyuan Technology Co ltd
Original Assignee
Beijing Ruizhi Tuyuan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ruizhi Tuyuan Technology Co ltd filed Critical Beijing Ruizhi Tuyuan Technology Co ltd
Priority to CN202010476681.4A priority Critical patent/CN111666351A/en
Publication of CN111666351A publication Critical patent/CN111666351A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a fuzzy clustering system based on user behavior data, and relates to the technical field of wireless internet behavior analysis and prediction; in order to construct a profit model for accurate marketing; the data acquisition module is used for collecting user behavior data of user running time and sending the user behavior data to the server, and the user behavior data comprises static data and dynamic data. According to the invention, after user classification is obtained, long-term behavior prediction aiming at user classification and short-term behavior correlation aiming at individual behavior are obtained by data mining, the accuracy of real-time behavior prediction is continuously updated and perfected at the running time, the capacity of equipment can be flexibly expanded, the performance of the equipment can be improved, the flexibility of technology upgrading and equipment updating is provided, the flexibility of expansion, adjustment and reconstruction of service functions is provided, the requirements and preferences of customers are known, and the browsing and interaction behavior data of the customers are emphasized.

Description

Fuzzy clustering system based on user behavior data
Technical Field
The invention relates to the technical field of wireless internet behavior analysis and prediction, in particular to a fuzzy clustering system based on user behavior data.
Background
With the wide application of 3G technology and the emergence of various intelligent mobile terminals, wireless internet users have a rapidly rising trend, wherein mobile phone applications are the most important parts of smart phones and also show considerable development situations, with the development of data analysis technology and intelligent storage technology, a large amount of behavior data generated by APP user groups can be stored, through deep excavation and processing of the mass data, behavior habits and preference characteristics of the users can be obtained, habits, behaviors and preferences of the users can be predicted, through deep analysis of user behaviors of the wireless internet, the real requirements of the users can be known, network resources are fully utilized, relevant information concerned by the users is provided, user experience and loyalty are improved, a better profit mode is further constructed, and the current wireless internet user behavior prediction also belongs to a newer research field, there is no more perfect solution.
Through retrieval, a patent with the Chinese patent application number of CN201910827753.2 discloses a prediction method, a system and a storage medium based on fuzzy clustering and a BP neural network, belonging to the technical field of the prediction of scenic spot passenger flow. The prediction method comprises the following steps: obtaining historical passenger flow volume data, historical e-commerce ticket booking data, historical air temperature data and historical weather data of scenic spots; and performing correlation analysis by taking a preset time period as a unit to obtain a key factor matrix. The prediction method, system and storage medium based on fuzzy clustering and BP neural network in the above patent have the following disadvantages: how to carry out accurate marketing to wireless internet users becomes an urgent problem in front of operators and mobile websites.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides a fuzzy clustering system based on user behavior data.
In order to achieve the purpose, the invention adopts the following technical scheme:
the fuzzy clustering system based on the user behavior data comprises a data acquisition module, a data analysis module and an output unit, wherein the data acquisition module is used for collecting the user behavior data of user operation time and sending the user behavior data to a server, and the user behavior data comprises static data and dynamic data; the data analysis module comprises a data extraction unit, a data preprocessing unit, a model algorithm unit and a fuzzy clustering analysis unit, wherein the model algorithm unit comprises a k-means clustering algorithm, and the k-means clustering algorithm is a clustering analysis algorithm for iterative solution; the output unit comprises a customer image and a visual interface.
Preferably: the user behavior data needs to consider three data dimensions, namely time, frequency and a result, which are used for labeling setting, the time dimension mainly relates to a time period and a duration length of behavior occurrence, the time period data is used for selecting a time range of target equipment, marketing analysis and marketing promotion can also be used for wind control and anti-fraud, the duration mainly relates to a behavior occurrence process, and starting and ending time points of the behavior are recorded.
Preferably: the data acquisition module mainly adopts an SDK mode to acquire data, the SDK is a few lines of codes, the type of the acquired data also depends on the position of a data buried point and is used for returning parameters, and meanwhile, the data acquisition module can also acquire the behavior of a client on an App page, such as clicking.
Preferably: the data burying points can be collected, researched and counted in the background of the data burying points, and can also be carried out through a third-party data analysis platform.
Preferably: the data embedding method comprises the following steps:
s1: defining data needing to be counted, and burying points according to the data needing to be counted;
s2: combing the data of the points to be buried and confirming the rationality;
s3: using a third-party data analysis platform, after embedding a point in an APP, uploading corresponding event ID and event name related information on the third-party platform, and making the ID and the name in a code consistent;
s4: after data embedding is completed, if the event conversion rate needs to be statistically analyzed, a funnel model needs to be added in advance, and data statistics can be started the next day after the funnel model is added.
Preferably: the data extraction unit includes the steps of:
s11: data extraction unit requirements are carried out through json, and the reasonability of the data extraction unit requirements is verified;
s12: through xpath, quickly positioning specific elements, acquiring power saving information, analyzing, extracting and presenting data, and verifying a data flow;
s13: and analyzing the data reasonability to obtain a behavior analysis result.
Preferably: the data preprocessing unit comprises a data cleaning technology which can be used for cleaning noise in data and correcting inconsistency; merging data from multiple data sources into a coherent data store, such as data integration techniques for data bins; data reduction techniques that can reduce the size of data by, for example, sniping, deleting redundant features, or clustering; data transformation techniques that can be used to compress data to a smaller interval, such as 0.0 to 1.0.
Preferably, the step of removing noise in the data is as follows:
step a1, constructing the user behavior data according to the following formula:
Figure BDA0002516042200000041
wherein X represents the total data of user behaviors, X1Represents time, x2Representing the frequency, wherein m represents the number of constructed user behavior total data;
step a2, a threshold between the noise data value and the normal data value is found according to the following formula:
Figure BDA0002516042200000042
wherein, mu (a, b) represents the mean value of the user behavior data in the neighborhood, s (a, b) represents the standard deviation of the user behavior data in the neighborhood, R is the dynamic range of the standard deviation, a correction parameter of l, and xi,jRepresenting the user behavior data value with the abscissa of i and the ordinate of j, m representing the total number of the constructed user behavior data, 2m representing the constructed user behaviorIs the number of total data values;
step a3, finding the median of the user behavior data according to the following formula:
f(a,b)=MED(X)
wherein X represents total user behavior data, and f (a, b) represents a median value in the user behavior data;
and step A4, according to the threshold q between the noise data value and the normal data value obtained in the step A2, the noise point which is larger than the threshold q is the noise point, and the noise point data value is replaced by the median of the user behavior data obtained in the step A3, so that the noise removal is completed.
Preferably: the k-means clustering algorithm comprises the following steps:
s31: randomly selecting k objects as initial clustering centers;
s32: calculating the distance between each object and each seed clustering center;
s33: each object is assigned to the cluster center closest to it.
Preferably: the fuzzy clustering analysis unit is divided into a classification method based on fuzzy relation, a fuzzy clustering algorithm based on target function and a fuzzy clustering algorithm based on neural network, and the classification method based on fuzzy relation comprises a system clustering method, a clustering algorithm based on equivalent relation, a clustering algorithm based on similar relation and a graph theory clustering algorithm.
Preferably: the client representation is the basis of user experience, typical characteristics of a client are described for an Internet application, then the client is abstracted into a person, and the person is used for describing the person; for product design, on the basis of establishing a user portrait, the behavior of a typical user is researched more deeply, the typical user is concentrated on first, the requirement of the typical user is met, and then user expansion is carried out.
The invention has the beneficial effects that: the data acquisition module collects user behavior data of user operation time, the data analysis module acquires the data to establish reasonable mobile phone users and behavior models thereof, covers natural and social attributes of the users and multi-latitude behavior attributes of the users in the internet surfing process, after examination and screening, the user classification is carried out by adopting a method of a model algorithm unit and a fuzzy clustering analysis unit, the influence on behavior prediction caused by inaccurate subjective classification is avoided, the user model is optimized, long-term behavior prediction aiming at user categories and short-term behavior correlation aiming at individual behaviors are obtained by data mining after the user classification is obtained, the accuracy of the real-time behavior prediction is continuously updated and perfected in the operation time, the capacity of equipment can be flexibly expanded and the performance of the equipment can be improved, the flexibility of technology upgrading and equipment updating is realized, and the expansion of supporting service functions and the performance of the equipment are realized, The flexibility of adjustment and reconstruction, the requirements and the preferences of customers are known, and the browsing and interaction behavior data of the customers are emphasized.
Drawings
FIG. 1 is a schematic view of a flow structure of a fuzzy clustering system based on user behavior data according to the present invention;
fig. 2 is a schematic diagram of a k-means clustering algorithm of the fuzzy clustering system based on user behavior data according to the present invention.
Detailed Description
The technical solution of the present patent will be described in further detail with reference to the following embodiments.
Reference will now be made in detail to embodiments of the present patent, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present patent and are not to be construed as limiting the present patent.
In the description of this patent, it is to be understood that the terms "center," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in the orientations and positional relationships indicated in the drawings for the convenience of describing the patent and for the simplicity of description, and are not intended to indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and are not to be considered limiting of the patent.
In the description of this patent, it is noted that unless otherwise specifically stated or limited, the terms "mounted," "connected," and "disposed" are to be construed broadly and can include, for example, fixedly connected, disposed, detachably connected, disposed, or integrally connected and disposed. The specific meaning of the above terms in this patent may be understood by those of ordinary skill in the art as appropriate.
Example 1:
the fuzzy clustering system based on the user behavior data, as shown in fig. 1 and fig. 2, comprises a data acquisition module, a data analysis module and an output unit, wherein the data acquisition module is used for collecting the user behavior data of the user running time and sending the user behavior data to a server, and the user behavior data comprises static data and dynamic data; the data analysis module comprises a data extraction unit, a data preprocessing unit, a model algorithm unit and a fuzzy clustering analysis unit, wherein the model algorithm unit comprises a k-means clustering algorithm, and the k-means clustering algorithm is a clustering analysis algorithm for iterative solution; the output unit comprises a customer image and a visual interface.
Further, the static data includes characteristics of the user as a natural person, such as age, gender, region, education degree, and the like, the static data is obtained and related to a sample source, and the dynamic data includes behavior characteristics of the user in the process of logging in the internet by using the mobile phone, such as browsing webpage category, staying time, reading habits, webpage characteristics, and the like.
The user behavior data needs to consider three data dimensions, namely time, frequency and a result, which are used for labeling setting, the time dimension mainly relates to a time period and a duration length of behavior occurrence, the time period data is used for selecting a time range of target equipment, marketing analysis and marketing promotion can also be used for wind control and anti-fraud, the duration mainly relates to a behavior occurrence process, and starting and ending time points of the behavior are recorded.
Furthermore, the frequency dimension mainly focuses on the occurrence frequency and trend of some specific behaviors, wherein the frequency and the interest of a client have a large positive correlation, the number of clicks and the number of browsing are positively correlated with the purchasing demand of the client in a certain period of time, the frequency can be used for marketing after tagging, the client which does not appear is identified, the frequency can also be used for analyzing the experience of the client and analyzing products, the experience of the products and the needs of the client can be known through thermodynamic diagrams, and the optimization of the internal layout of the App and the sales of related products can also be carried out.
Further, the result is used for labeling, setting main attention to whether the buying and selling are finished or not, judging the result of clicking and browsing by the client, dividing result data into transaction and non-transaction, acquiring filled numerical values based on business needs and further applying the filled numerical values, wherein transaction data in the result data can be used for experience analysis of products, experience analysis of the client, channel ROI analysis and the like, the non-transaction data can be used for secondary marketing, potential clients are sold again, and comprehensive analysis needs to be carried out by combining time period, duration and frequency data during the secondary marketing, so that a target customer group is screened out for marketing.
The data acquisition module mainly adopts an SDK mode to acquire data, the SDK is a few lines of codes, the type of the acquired data also depends on the position of a data buried point and is used for returning parameters, and meanwhile, the data acquisition module can also acquire the behavior of a client on an App page, such as clicking.
Furthermore, any data collected by the SDK mode is based on subjective wishes of customers, whether the data relate to personal privacy data can be distinguished from the SDK embedded point, and the personal privacy data comprise 7 data types in the PII, such as social security numbers, mobile phone numbers, home addresses, private postcodes and the like.
Furthermore, the data embedding point enables related personnel such as products or operations to perform customized statistics on user data according to specific requirements, for example, when a behavior mode of a user is required to be tracked, or a page related click condition and a key path conversion condition are observed, and when an activity effect of a certain event is analyzed, the data embedding point needs to be performed in advance, and corresponding data can be observed after an APP is on line, and then investigation and analysis are performed.
The data burying points can be collected, researched and counted in the background of the data burying points, and can also be carried out through a third-party data analysis platform.
The data embedding method comprises the following steps:
s1: defining data needing to be counted, and burying points according to the data needing to be counted;
s2: combing the data of the points to be buried and confirming the rationality;
s3: using a third-party data analysis platform, after embedding a point in an APP, uploading corresponding event ID, event name and other related information on the third-party platform, and making the ID and the name in a code consistent;
s4: after data embedding is completed, if the event conversion rate needs to be statistically analyzed, a funnel model needs to be added in advance, and data statistics can be started the next day after the funnel model is added.
Further, the ID and the name are generally arranged and named on the product side, and the iOS and the Android are unified.
The data extraction unit includes the steps of:
s11: data extraction unit requirements are carried out through json, and the reasonability of the data extraction unit requirements is verified;
s12: through xpath, quickly positioning specific elements, acquiring power saving information, analyzing, extracting and presenting data, and verifying a data flow;
s13: and analyzing the data reasonability to obtain a behavior analysis result.
The data preprocessing unit comprises a data cleaning technology which can be used for cleaning noise in data and correcting inconsistency; merging data from multiple data sources into a coherent data store, such as data integration techniques for data bins; data reduction techniques that can reduce the size of data by, for example, sniping, deleting redundant features, or clustering; data transformation techniques that can be used to compress data into smaller intervals, such as 0.0 to 1.0, can improve the accuracy and efficiency of mining algorithms that design distance metrics.
The step of removing noise in the data is as follows:
step a1, constructing the user behavior data according to the following formula:
Figure BDA0002516042200000101
wherein X represents the total data of user behaviors, X1Represents time, x2Representing the frequency, wherein m represents the number of constructed user behavior total data;
step a2, a threshold between the noise data value and the normal data value is found according to the following formula:
Figure BDA0002516042200000111
wherein, mu (a, b) represents the mean value of the user behavior data in the neighborhood, s (a, b) represents the standard deviation of the user behavior data in the neighborhood, R is the dynamic range of the standard deviation, a correction parameter of l, and xi,jRepresenting the user behavior data value with the abscissa of i and the ordinate of j, wherein m represents the number of the constructed user behavior total data, and 2m represents the number of the constructed user behavior total data values;
step a3, finding the median of the user behavior data according to the following formula:
f(a,b)=MED(X)
wherein X represents total user behavior data, and f (a, b) represents a median value in the user behavior data;
and step A4, according to the threshold q between the noise data value and the normal data value obtained in the step A2, the noise point which is larger than the threshold q is the noise point, and the noise point data value is replaced by the median of the user behavior data obtained in the step A3, so that the noise removal is completed.
Has the advantages that: the algorithm adopts an image processing algorithm to create user behavior data, wherein noise values in the user behavior data are processed, the noise values in the user behavior data are found out by calculating threshold values, the noise values are removed by adopting median processing, and the optimal user behavior data are provided for the training of a k-means clustering model in the later period.
The k-means clustering algorithm comprises the following steps:
s31: randomly selecting k objects as initial clustering centers;
s32: calculating the distance between each object and each seed clustering center;
s33: each object is assigned to the cluster center closest to it.
Further, the cluster centers and the objects assigned to them correspond to a cluster, and the cluster centers are recalculated according to the objects existing in the cluster when a sample is assigned, and the process is repeated until a specific condition is met, wherein the termination condition can be that no (or minimum number of) objects are reassigned to different clusters or no (or minimum number of) cluster centers are changed, or that the square error sum is a local minimum.
Further, the selection of the k value in S31 is generally determined according to actual requirements, or the k value, a measure of distance, is directly given when the algorithm is implemented: given sample χi={χ12,...,χnThe ones withj12,...,χnWhere i, j-1, 2, and n is the number of samples, update the cluster center: for each divided cluster, calculating the mean value of the sample points in each cluster, and taking the mean value as a new cluster center, wherein the k-means algorithm process is as follows:
inputting: training data set D ═ χ(1)(2),...χ(m)The number of clusters k;
the process is as follows: the function kMeans (D, k, maxIter);
randomly select K samples from D as the initial "cluster center" vector μ(1)(2),...,μ(k)
Let Ci=φ(1≤i≤k);
j=1,2,...,m;
Calculating sample χ(j)With each "cluster center" vector mu(i)(i is more than or equal to 1 and less than or equal to k);
determining χ according to the nearest cluster center vector(j)Cluster mark of (2)j=argmini∈{1,2,...,k}dji
Subjecting the sample to X(j)Sliding into the corresponding cluster Cλj=Cλj∪{χ(j)};
i=1,2,...,k;
Computing a new "cluster center" vector
Figure BDA0002516042200000131
(i))'=μ(i)
Vector mu of current' cluster center(i)Is updated to (mu)(i))';
Keeping the current mean vector unchanged;
1 none of the current "cluster center" vectors are updated;
and (3) outputting: cluster division C ═ C1,C2,...,Ck
In order to avoid an excessively long running time, a maximum running time or a minimum adjustment threshold is usually set, and if the maximum number of rounds is reached or the adjustment amplitude is smaller than the threshold, the running is stopped.
The fuzzy clustering analysis unit is divided into a classification method based on fuzzy relation, a fuzzy clustering algorithm based on target function and a fuzzy clustering algorithm based on neural network, and the classification method based on fuzzy relation comprises a system clustering method, a clustering algorithm based on equivalent relation, a clustering algorithm based on similar relation, a graph theory clustering algorithm and the like.
Further, the fuzzy relation-based classification method is that the clustered samples or variables are respectively considered as a group, then the similarity of the statistical aspects between the classes is determined, two or a plurality of the closest classes are selected and combined into a new class, the similarity of the statistical aspect between the new class and other classes is calculated, then the two or a plurality of the closest groups are selected and combined into a new class, and the method is terminated until all the samples or variables are combined into one class.
Further, the fuzzy clustering algorithm based on the objective function summarizes the clustering analysis into a nonlinear programming problem with constraints, optimal division and clustering of the data set are obtained through optimization solution, the step-by-step clustering method is a fuzzy clustering analysis unit method based on the fuzzy division, and can be summarized as that samples to be classified are determined in advance and are divided into several classes, then the samples are classified again according to the optimization principle, and the classification is ended after multiple iterations until the classification is reasonable.
Further, the fuzzy clustering algorithm based on the neural network is to adopt a competitive learning algorithm to guide the clustering process of the network.
The client representation is the basis of user experience, typical characteristics of a client are described for an Internet application, then the client is abstracted into a person, and the person is used for describing the person; for product design, on the basis of establishing a user portrait, the behavior of a typical user is researched more deeply, the typical user is concentrated on first, the requirement of the typical user is met, and then user expansion is carried out.
The visual interface comprises an agile visual mode, and an agile visual analysis application development mode for agile and iterative analysis of the fairy tale can quickly meet the visual analysis requirements of the client, and the business value of the client is maximized by improving the delivery success rate of the visual analysis system.
When the embodiment is used, the data acquisition module collects user behavior data of user running time, the data analysis module acquires the data to establish reasonable mobile phone users and behavior models thereof, the natural and social attributes of the users and the multi-latitude behavior attributes of the users in the internet surfing process are covered, after screening, the user classification is carried out by adopting a method of a model algorithm unit and a fuzzy clustering analysis unit, the influence on behavior prediction caused by inaccurate subjective classification is avoided, the user model is optimized, long-term behavior prediction aiming at user categories and short-term behavior correlation aiming at individual behaviors are obtained by data mining after the user classification is obtained, the accuracy of real-time behavior prediction is continuously updated and perfected in the running time, the equipment capacity can be flexibly expanded, the equipment performance is improved, and the flexibility of technology upgrading and equipment updating is provided, the method has the flexibility of supporting the expansion, adjustment and reconstruction of the service function, knows the requirements and the preference of the client and attaches importance to the browsing and interaction behavior data of the client.
Example 2:
the fuzzy clustering system based on the user behavior data, as shown in fig. 1 and fig. 2, comprises a data acquisition module, a data analysis module and an output unit, wherein the data acquisition module is used for collecting the user behavior data of the user running time and sending the user behavior data to a server, and the user behavior data comprises static data and dynamic data; the data analysis module comprises a data extraction unit, a data preprocessing unit, a model algorithm unit and a fuzzy clustering analysis unit, wherein the model algorithm unit comprises a k-means clustering algorithm, and the k-means clustering algorithm is a clustering analysis algorithm for iterative solution; the output unit comprises a customer image and a visual interface.
Further, the static data includes characteristics of the user as a natural person, such as age, gender, region, education degree, and the like, the static data is obtained and related to a sample source, and the dynamic data includes behavior characteristics of the user in the process of logging in the internet by using the mobile phone, such as browsing webpage category, staying time, reading habits, webpage characteristics, and the like.
The data acquisition module mainly adopts an SDK mode to acquire data, the SDK is a few lines of codes, the type of the acquired data also depends on the position of a data buried point and is used for returning parameters, and meanwhile, the data acquisition module can also acquire the behavior of a client on an App page, such as clicking.
Furthermore, any data collected by the SDK mode is based on subjective wishes of customers, whether the data relate to personal privacy data can be distinguished from the SDK embedded point, and the personal privacy data comprise 7 data types in the PII, such as social security numbers, mobile phone numbers, home addresses, private postcodes and the like.
Furthermore, the data embedding point enables related personnel such as products or operations to perform customized statistics on user data according to specific requirements, for example, when a behavior mode of a user is required to be tracked, or a page related click condition and a key path conversion condition are observed, and when an activity effect of a certain event is analyzed, the data embedding point needs to be performed in advance, and corresponding data can be observed after an APP is on line, and then investigation and analysis are performed.
The data burying points can be collected, researched and counted in the background of the data burying points, and can also be carried out through a third-party data analysis platform.
The data embedding method comprises the following steps:
s1: defining data needing to be counted, and burying points according to the data needing to be counted;
s2: combing the data of the points to be buried and confirming the rationality;
s3: using a third-party data analysis platform, after embedding a point in an APP, uploading corresponding event ID, event name and other related information on the third-party platform, and making the ID and the name in a code consistent;
s4: after data embedding is completed, if the event conversion rate needs to be statistically analyzed, a funnel model needs to be added in advance, and data statistics can be started the next day after the funnel model is added.
Further, the ID and the name are generally arranged and named on the product side, and the iOS and the Android are unified.
The data extraction unit includes the steps of:
s11: data extraction unit requirements are carried out through json, and the reasonability of the data extraction unit requirements is verified;
s12: through xpath, quickly positioning specific elements, acquiring power saving information, analyzing, extracting and presenting data, and verifying a data flow;
s13: and analyzing the data reasonability to obtain a behavior analysis result.
The data preprocessing unit comprises a data cleaning technology which can be used for cleaning noise in data and correcting inconsistency; merging data from multiple data sources into a coherent data store, such as data integration techniques for data bins; data reduction techniques that can reduce the size of data by, for example, sniping, deleting redundant features, or clustering; data transformation techniques that can be used to compress data into smaller intervals, such as 0.0 to 1.0, can improve the accuracy and efficiency of mining algorithms that design distance metrics.
The k-means clustering algorithm comprises the following steps:
s31: randomly selecting k objects as initial clustering centers;
s32: calculating the distance between each object and each seed clustering center;
s33: each object is assigned to the cluster center closest to it.
Further, the cluster centers and the objects assigned to them correspond to a cluster, and the cluster centers are recalculated according to the objects existing in the cluster when a sample is assigned, and the process is repeated until a specific condition is met, wherein the termination condition can be that no (or minimum number of) objects are reassigned to different clusters or no (or minimum number of) cluster centers are changed, or that the square error sum is a local minimum.
Further, the selection of the k value in S31 is generally determined according to actual requirements, or the k value, a measure of distance, is directly given when the algorithm is implemented: given sample χi={χ12,...,χnThe ones withj12,...,χnWhere i, j-1, 2, and n is the number of samples, update the cluster center: for each divided cluster, calculating the mean value of the sample points in each cluster, and taking the mean value as a new cluster center, wherein the k-means algorithm process is as follows:
inputting: training data set D ═ χ(1)(2),...χ(m)The number of clusters k;
the process is as follows: the function kMeans (D, k, maxIter);
randomly select K samples from D as the initial "cluster center" vector μ(1)(2),...,μ(k)
Let Ci=φ(1≤i≤k);
j=1,2,...,m;
Calculating sample χ(j)With each "cluster center" vector mu(i)(i is more than or equal to 1 and less than or equal to k);
determining χ according to the nearest cluster center vector(j)Cluster mark of (2)j=argmini∈{1,2,...,k}dji
Subjecting the sample to X(j)Sliding into the corresponding cluster Cλj=Cλj∪{χ(j)};
i=1,2,...,k;
Computing a new "cluster center" vector
Figure BDA0002516042200000191
(i))'=μ(i)
Vector mu of current' cluster center(i)Is updated to (mu)(i))';
Keeping the current mean vector unchanged;
1 none of the current "cluster center" vectors are updated;
and (3) outputting: cluster division C ═ C1,C2,...,Ck
In order to avoid an excessively long running time, a maximum running time or a minimum adjustment threshold is usually set, and if the maximum number of rounds is reached or the adjustment amplitude is smaller than the threshold, the running is stopped.
The fuzzy clustering analysis unit is divided into a classification method based on fuzzy relation, a fuzzy clustering algorithm based on target function and a fuzzy clustering algorithm based on neural network, and the classification method based on fuzzy relation comprises a system clustering method, a clustering algorithm based on equivalent relation, a clustering algorithm based on similar relation, a graph theory clustering algorithm and the like.
Further, the fuzzy relation-based classification method is that the clustered samples or variables are respectively considered as a group, then the similarity of the statistical aspects between the classes is determined, two or a plurality of the closest classes are selected and combined into a new class, the similarity of the statistical aspect between the new class and other classes is calculated, then the two or a plurality of the closest groups are selected and combined into a new class, and the method is terminated until all the samples or variables are combined into one class.
Further, the fuzzy clustering algorithm based on the objective function summarizes the clustering analysis into a nonlinear programming problem with constraints, optimal division and clustering of the data set are obtained through optimization solution, the step-by-step clustering method is a fuzzy clustering analysis unit method based on the fuzzy division, and can be summarized as that samples to be classified are determined in advance and are divided into several classes, then the samples are classified again according to the optimization principle, and the classification is ended after multiple iterations until the classification is reasonable.
Further, the fuzzy clustering algorithm based on the neural network is to adopt a competitive learning algorithm to guide the clustering process of the network.
The client representation is the basis of user experience, typical characteristics of a client are described for an Internet application, then the client is abstracted into a person, and the person is used for describing the person; for product design, on the basis of establishing a user portrait, the behavior of a typical user is researched more deeply, the typical user is concentrated on first, the requirement of the typical user is met, and then user expansion is carried out.
The visual interface comprises an agile visual mode, and an agile visual analysis application development mode for agile and iterative analysis of the fairy tale can quickly meet the visual analysis requirements of the client, and the business value of the client is maximized by improving the delivery success rate of the visual analysis system.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (11)

1. The fuzzy clustering system based on the user behavior data comprises a data acquisition module, a data analysis module and an output unit, and is characterized in that the data acquisition module is used for collecting the user behavior data of user operation time and sending the user behavior data to a server, and the user behavior data comprises static data and dynamic data; the data analysis module comprises a data extraction unit, a data preprocessing unit, a model algorithm unit and a fuzzy clustering analysis unit, wherein the model algorithm unit comprises a k-means clustering algorithm, and the k-means clustering algorithm is a clustering analysis algorithm for iterative solution; the output unit comprises a customer image and a visual interface.
2. The fuzzy clustering system based on user behavior data as claimed in claim 1, wherein the user behavior data needs to consider three data dimensions, which are time, frequency, and result for tagging, and the time dimension mainly relates to the time period and duration length of behavior occurrence, wherein the time period data is used for selecting the time range of the target device, marketing analysis and marketing promotion, and also can be used for wind control and anti-fraud, and the duration mainly relates to the process of behavior occurrence, and records the time points of behavior start and end.
3. The fuzzy clustering system based on user behavior data as claimed in claim 1, wherein the data collection module mainly uses SDK to collect data, where SDK is a few lines of codes, the type of collected data also depends on the position of data embedded point for returning parameters, and also collects the behavior of the client on App page, such as clicking.
4. The fuzzy clustering system based on user behavior data as claimed in claim 3, wherein the data burial point can be collected, researched and counted in the background of the user, or can be performed by a third-party data analysis platform.
5. The fuzzy clustering system based on user behavior data as claimed in claim 4, wherein the data embedding comprises the steps of:
s1: defining data needing to be counted, and burying points according to the data needing to be counted;
s2: combing the data of the points to be buried and confirming the rationality;
s3: using a third-party data analysis platform, after embedding a point in an APP, uploading corresponding event ID and event name related information on the third-party platform, and making the ID and the name in a code consistent;
s4: after data embedding is completed, if the event conversion rate needs to be statistically analyzed, a funnel model needs to be added in advance, and data statistics can be started the next day after the funnel model is added.
6. The fuzzy clustering system based on user behavior data as claimed in claim 2, wherein the data extraction unit comprises the steps of:
s11: data extraction unit requirements are carried out through json, and the reasonability of the data extraction unit requirements is verified;
s12: through xpath, quickly positioning specific elements, acquiring power saving information, analyzing, extracting and presenting data, and verifying a data flow;
s13: and analyzing the data reasonability to obtain a behavior analysis result.
7. The fuzzy clustering system based on user behavior data as claimed in claim 6, wherein the data preprocessing unit comprises a data cleaning technique which can be used to clean noise in data and correct inconsistency; merging data from multiple data sources into a coherent data store, such as data integration techniques for data bins; data reduction techniques that can reduce the size of data by, for example, sniping, deleting redundant features, or clustering; data transformation techniques that can be used to compress data to a smaller interval, such as 0.0 to 1.0.
8. The fuzzy clustering system based on user behavior data as claimed in claim 7, wherein the step of removing noise in the data is as follows:
step a1, constructing the user behavior data according to the following formula:
Figure FDA0002516042190000031
wherein X represents the total data of user behaviors, X1Represents time, x2Representing the frequency, wherein m represents the number of constructed user behavior total data;
step a2, a threshold between the noise data value and the normal data value is found according to the following formula:
Figure FDA0002516042190000032
wherein, mu (a, b) represents the mean value of the user behavior data in the neighborhood, s (a, b) represents the standard deviation of the user behavior data in the neighborhood, R is the dynamic range of the standard deviation, a correction parameter of l, and xi,jRepresenting the user behavior data value with the abscissa of i and the ordinate of j, wherein m represents the number of the constructed user behavior total data, and 2m represents the number of the constructed user behavior total data values;
step a3, finding the median of the user behavior data according to the following formula:
f(a,b)=MED(X)
wherein X represents total user behavior data, and f (a, b) represents a median value in the user behavior data;
and step A4, according to the threshold q between the noise data value and the normal data value obtained in the step A2, the noise point which is larger than the threshold q is the noise point, and the noise point data value is replaced by the median of the user behavior data obtained in the step A3, so that the noise removal is completed.
9. The fuzzy clustering system based on user behavior data as claimed in claim 7, wherein the k-means clustering algorithm comprises the following steps:
s31: randomly selecting k objects as initial clustering centers;
s32: calculating the distance between each object and each seed clustering center;
s33: each object is assigned to the cluster center closest to it.
10. The fuzzy clustering system based on user behavior data as claimed in claim 1, wherein the fuzzy clustering analysis unit is divided into fuzzy relation-based classification methods including systematic clustering, equivalence relation-based clustering, similarity relation-based clustering and graph theory clustering, objective function-based fuzzy clustering and neural network-based fuzzy clustering.
11. The fuzzy clustering system based on user behavior data as claimed in claim 10, wherein the customer representation is the basis of user experience, and typical features of customers are described for internet applications, and then such customers are abstracted into a person, and then such person is described; for product design, on the basis of establishing a user portrait, the behavior of a typical user is researched more deeply, the typical user is concentrated on first, the requirement of the typical user is met, and then user expansion is carried out.
CN202010476681.4A 2020-05-29 2020-05-29 Fuzzy clustering system based on user behavior data Pending CN111666351A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010476681.4A CN111666351A (en) 2020-05-29 2020-05-29 Fuzzy clustering system based on user behavior data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010476681.4A CN111666351A (en) 2020-05-29 2020-05-29 Fuzzy clustering system based on user behavior data

Publications (1)

Publication Number Publication Date
CN111666351A true CN111666351A (en) 2020-09-15

Family

ID=72385164

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010476681.4A Pending CN111666351A (en) 2020-05-29 2020-05-29 Fuzzy clustering system based on user behavior data

Country Status (1)

Country Link
CN (1) CN111666351A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112241494A (en) * 2020-12-10 2021-01-19 平安科技(深圳)有限公司 Key information pushing method and device based on user behavior data
CN112804353A (en) * 2021-03-19 2021-05-14 北京孵家科技股份有限公司 Customer information management method, device and system based on deep data mining
CN113159802A (en) * 2021-04-15 2021-07-23 武汉白虹软件科技有限公司 Algorithm model and system for realizing fraud-related application collection and feature extraction clustering
CN113282651A (en) * 2021-04-25 2021-08-20 青岛海尔科技有限公司 Data processing method and device, storage medium and electronic device
CN113553182A (en) * 2021-07-22 2021-10-26 工银科技有限公司 Configuration method, device, equipment, medium and program product of terminal control strategy
CN113554515A (en) * 2021-06-26 2021-10-26 陈思佳 Internet financial control method, system, device and medium
CN113821574A (en) * 2021-08-31 2021-12-21 北京达佳互联信息技术有限公司 User behavior classification method and device and storage medium
CN114610204A (en) * 2022-03-14 2022-06-10 中国农业银行股份有限公司 Auxiliary device and method for data processing, storage medium and electronic equipment
CN116307489A (en) * 2023-02-01 2023-06-23 中博信息技术研究院有限公司 Visual dynamic analysis method and system based on user behavior modeling
CN116842238A (en) * 2023-07-24 2023-10-03 武汉赛思云科技有限公司 Method and system for realizing enterprise data visualization based on big data analysis

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7788470B1 (en) * 2008-03-27 2010-08-31 Xilinx, Inc. Shadow pipeline in an auxiliary processor unit controller
CN101908205A (en) * 2010-06-09 2010-12-08 河北师范大学 Magic square coding-based median filter method
CN102238045A (en) * 2010-04-27 2011-11-09 广州迈联计算机科技有限公司 System and method for predicting user behavior in wireless Internet

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7788470B1 (en) * 2008-03-27 2010-08-31 Xilinx, Inc. Shadow pipeline in an auxiliary processor unit controller
CN102238045A (en) * 2010-04-27 2011-11-09 广州迈联计算机科技有限公司 System and method for predicting user behavior in wireless Internet
CN101908205A (en) * 2010-06-09 2010-12-08 河北师范大学 Magic square coding-based median filter method

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
.WEN_KAI: "Xpath提取数据", 《HTTPS://BLOG.CSDN.NET/WEIXIN_44321182/ARTICLE/DETAILS/86628031》 *
J.SAUVOLA等: "Adaptive document image binarization", 《PATTERN RECOGNITION》 *
SIN_GEEK: "颜色迁移之四——模糊聚类(FCM)算法", 《HTTPS://BLOG.CSDN.NET/SIN_GEEK/ARTICLE/DETAILS/22896197》 *
大数据公社: "一文读懂「用户行为数据」的采集、分析和应用", 《HTTPS://BLOG.CSDN.NET/SFM06SQVW55DFT1/ARTICLE/DETAILS/78739738?UTM_SOURCE=BLOGXGWZ7》 *
孟海东等: "《大数据挖掘技术与应用》", 31 December 2014, 冶金工业出版社 *
尘濯: "第九章 聚类", 《HTTPS://WWW.JIANSHU.COM/P/662E60656A96》 *
涛哥论道: "产品思维之用户画像", 《HTTPS://WWW.SOHU.COM/A/257366567_799341》 *
第二场雪: "手把手教你进行APP数据埋点", 《HTTPS://COFFEE.PMCAFF.COM/ARTICLE/929345781595264/PMCAFF?UTM_SOURCE=FORUM》 *
远有青山: "数据预处理_数据清理", 《HTTPS://BLOG.CSDN.NET/HOLANDSTONE/ARTICLE/DETAILS/79034843》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112241494B (en) * 2020-12-10 2021-03-26 平安科技(深圳)有限公司 Key information pushing method and device based on user behavior data
CN112241494A (en) * 2020-12-10 2021-01-19 平安科技(深圳)有限公司 Key information pushing method and device based on user behavior data
CN112804353A (en) * 2021-03-19 2021-05-14 北京孵家科技股份有限公司 Customer information management method, device and system based on deep data mining
CN112804353B (en) * 2021-03-19 2021-07-27 北京孵家科技股份有限公司 Customer information management method, device and system based on deep data mining
CN113159802A (en) * 2021-04-15 2021-07-23 武汉白虹软件科技有限公司 Algorithm model and system for realizing fraud-related application collection and feature extraction clustering
CN113282651A (en) * 2021-04-25 2021-08-20 青岛海尔科技有限公司 Data processing method and device, storage medium and electronic device
CN113554515A (en) * 2021-06-26 2021-10-26 陈思佳 Internet financial control method, system, device and medium
CN113553182A (en) * 2021-07-22 2021-10-26 工银科技有限公司 Configuration method, device, equipment, medium and program product of terminal control strategy
CN113821574A (en) * 2021-08-31 2021-12-21 北京达佳互联信息技术有限公司 User behavior classification method and device and storage medium
CN114610204A (en) * 2022-03-14 2022-06-10 中国农业银行股份有限公司 Auxiliary device and method for data processing, storage medium and electronic equipment
CN114610204B (en) * 2022-03-14 2024-03-26 中国农业银行股份有限公司 Auxiliary device and method for data processing, storage medium and electronic equipment
CN116307489A (en) * 2023-02-01 2023-06-23 中博信息技术研究院有限公司 Visual dynamic analysis method and system based on user behavior modeling
CN116842238A (en) * 2023-07-24 2023-10-03 武汉赛思云科技有限公司 Method and system for realizing enterprise data visualization based on big data analysis
CN116842238B (en) * 2023-07-24 2024-03-22 右来了(北京)科技有限公司 Method and system for realizing enterprise data visualization based on big data analysis

Similar Documents

Publication Publication Date Title
CN111666351A (en) Fuzzy clustering system based on user behavior data
US8688518B2 (en) Method, algorithm, and computer program for targeting messages including advertisements in an interactive measurable medium
CN107797894B (en) APP user behavior analysis method and device
KR101524971B1 (en) Personality traits prediction method and apparatus based on consumer psychology
KR20210066514A (en) Marketing solution system that can analyze and manage the impact of the influencer and Marketing solution method using the same
CN107818334A (en) A kind of mobile Internet user access pattern characterizes and clustering method
US11200501B2 (en) Accurate and interpretable rules for user segmentation
WO2019061664A1 (en) Electronic device, user's internet surfing data-based product recommendation method, and storage medium
CN103810162A (en) Method and system for recommending network information
US20170288989A1 (en) Systems and Techniques for Determining Associations Between Multiple Types of Data in Large Data Sets
CN114186760A (en) Analysis method and system for stable operation of enterprise and readable storage medium
CN114371946B (en) Information push method and information push server based on cloud computing and big data
CN111159559A (en) Method for constructing recommendation engine according to user requirements and user behaviors
CN117271905B (en) Crowd image-based lateral demand analysis method and system
CN112686690B (en) Data processing method, device, electronic equipment and computer readable storage medium
CN116739794B (en) User personalized scheme recommendation method and system based on big data and machine learning
Pratondo et al. Prediction of Operating System Preferences on Mobile Phones Using Machine Learning
CN111858560A (en) Financial data automated testing and monitoring system based on data warehouse
US20170330221A1 (en) Systems and methods for integration of universal marketing activities
CN115391416A (en) Potential customer mining method and device, electronic equipment and storage medium
CN111899057A (en) Customer portrait data clustering analysis system based on edge cloud node data collection
CN112308419A (en) Data processing method, device, equipment and computer storage medium
KR101656024B1 (en) Matching apparatus and method for mate candidate
CN116485352B (en) Member management and data analysis method, device, equipment and storage medium
CN116484293B (en) Platform user payment behavior prediction method based on SVM algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination